| Type: | Package | 
| Title: | Space-Time Anomaly Detection using Scan Statistics | 
| Description: | Detection of anomalous space-time clusters using the scan statistics methodology. Focuses on prospective surveillance of data streams, scanning for clusters with ongoing anomalies. Hypothesis testing is made possible by Monte Carlo simulation. Allévius (2018) <doi:10.21105/joss.00515>. | 
| Version: | 1.1.2 | 
| Date: | 2025-10-16 | 
| Encoding: | UTF-8 | 
| License: | GPL (≥ 3) | 
| URL: | https://github.com/promerpr/scanstatistics | 
| BugReports: | https://github.com/promerpr/scanstatistics/issues | 
| Depends: | R (≥ 3.4) | 
| Imports: | dplyr, ismev, magrittr, plyr, Rcpp, stats, sets, tibble, tidyr | 
| Suggests: | purrr, doParallel, foreach, ggplot2, knitr, MASS, pscl, reshape2, rmarkdown, sp, testthat, gamlss.dist | 
| VignetteBuilder: | knitr | 
| RoxygenNote: | 7.3.3 | 
| ByteCompile: | true | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| LazyData: | true | 
| NeedsCompilation: | yes | 
| Packaged: | 2025-10-16 22:52:33 UTC; prpresent | 
| Author: | Benjamin Allévius [aut], Paul Romer Present [ctb, cre] | 
| Maintainer: | Paul Romer Present <paul.romerpresent@fastmail.fm> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-10-16 23:10:02 UTC | 
scanstatistics: Space-time anomaly detection using scan statistics.
Description
The scanstatistics package provides two categories of important functions: data preparation functions, and the scan statistics themselves.
Data preparation functions
These functions prepare your data for use. In particular, it helps you define the zones which will be considered by the scan statistics.
Scan statistics
These are the functions used for space-time anomaly detection. Scan statistic
functions for univariate space-time data have a name that begins with 
scan_ and functions for multivariate space-time data have a name that
begins with mscan_.
Author(s)
Maintainer: Paul Romer Present paul.romerpresent@fastmail.fm [contributor]
Authors:
- Benjamin Allévius benjak@math.su.se 
See Also
Useful links:
- Report bugs at https://github.com/promerpr/scanstatistics/issues 
Longitude and latitude of New Mexico county seats.
Description
A dataset containing the longitude and latitude of the county seats of New Mexico, except for Cibola county.
Usage
NM_geo
Format
A data frame with 33 rows and 7 variables:
- county
- Factor; the counties of New Mexico (no spaces). 
- seat
- Character; the name of the county seat, i.e. the administrative center or seat of government. 
- area(km2)
- Numeric; the area in square kilometers of each county. 
- seat_long
- Numeric; the longitude of the county seat. 
- seat_lat
- Numeric; the latitude of the county seat. 
- center_long
- Numeric; the longitude of the geographical center of the county. 
- center_lat
- Numeric; the latitude of the geographical center of the county. 
Source
https://en.wikipedia.org/wiki/List_of_counties_in_New_Mexico
Data to plot the counties of New Mexico.
Description
Map data for New Mexico. Was created using ggplot2::map_data.
Usage
NM_map
Format
A data frame with 867 rows and 7 variables:
- long
- Numeric; longitude of county polygon corner. 
- lat
- Numeric; latitude of county polygon corner. 
- group
- Numeric; grouping by county. 
- order
- Numeric; order of the polygon corners. 
- region
- Character; region is "new mexico" for all rows. 
- subregion
- Character; the county name (with spaces). 
- county
- Factor; the county name (no spaces). 
Population and brain cancer cases in New Mexico counties during 1973–1991.
Description
A dataset containing the population count and number of brain cancer cases in the counties of New Mexico during the years 1973–1991. The population numbers are interpolations from the censuses conducted in 1973, 1982, and 1991. Interpolations were done using a quadratic function of time. Thus the year-to-year changes are overly smooth but match the census numbers in the three years mentioned.
Usage
NM_popcas
Format
A data frame with 608 rows and 4 variables:
- year
- Integer; the year the cases were recorded. 
- county
- Character; the name of the county (no spaces). 
- population
- Integer; the population in that county and year. 
- count
- Integer; the number of brain cancer cases in that county and year. 
Set of increasing sets from left to right of input vector.
Description
Returns a set (list) of the increasing sets (integer vectors) of the input 
vector v, in the sense that the first set contains the first element 
of v, the second set the first and second elements of v, and so 
on.
Usage
closest_subsets(v)
Arguments
| v | An integer vector. Meant to represent the  | 
Value
A list of the same length as the input. The first element of the list is v[1], the second is sort(v[1:2]), the third sort(v[1:3]), and so on.
Find the connected sets for a location and its k nearest neighbors.
Description
Returns a set of sets, each set of the latter type containing
the location itself and zero or more of its neighbors, if they are connected.
Usage
connected_neighbors(neighbors, adjacency_matrix)
Arguments
| neighbors | A vector of neighbors to a location, the first element of the vector being the specific location, and the other elements its other nearest neighbors. Locations should be encoded as integers. | 
| adjacency_matrix | A boolean matrix, with element  | 
Value
Returns a set of sets, each set of the latter type 
containing the location itself and zero or more of its neighbors, if they 
are connected.
Return those elements in the second set which are connected to those in the first.
Description
Return those elements in the second set Z_1 which are connected to 
those in the first set Z_0, according to the adjacency matrix.
Usage
connected_to(Z_0, Z_1, adjacency_matrix)
Arguments
| Z_0 | A set of locations, given as integers. | 
| Z_1 | A set of locations, given as integers. | 
| adjacency_matrix | A boolean matrix, with element  | 
Value
A set, possibly empty, containing those locations in Z_1
that are connected to any of the locations in Z_0.
Get the k nearest neighbors for each location, given its coordinates.
Description
Get the k nearest neighbors for each location, including the location itself.
This function calls dist, so the options for the 
distance measure used is the same as for that one. Distances are calculated 
between rows.
Usage
coords_to_knn(x, k = min(10, nrow(x)), method = "euclidean", p = 2)
Arguments
| x | a numeric matrix, data frame or  | 
| k | The number of nearest neighbors, counting the location itself. | 
| method | the distance measure to be used.  This must be one of
 | 
| p | The power of the Minkowski distance. | 
Value
An integer matrix of the k nearest neighbors for each location. 
Each row corresponds to a location, with the first element of each row 
being the location itself. Locations are encoded as integers.
Examples
x <- matrix(c(0, 0,
              1, 0,
              2, 1,
              0, 4,
              1, 3),
            ncol = 2, byrow = TRUE)
plot(x)
coords_to_knn(x)
Convert a long data frame to a wide matrix.
Description
Convert a long data frame to a wide matrix, with time along the row dimension and locations along the column dimension. Values in the matrix could be e.g. the observed counts or the population.
Usage
df_to_matrix(df, time_col = 1, location_col = 2, value_col = 3)
Arguments
| df | A data frame with at least 3 columns. | 
| time_col | Integer or string that specifies the time column. | 
| location_col | Integer or string that specifies the location column. | 
| value_col | Integer or string that specifies the value column. | 
Value
A matrix with time on rows and locations on columns.
Given a distance matrix, find the k nearest neighbors.
Description
Given a distance matrix, calculate the k nearest neighbors of each 
location, including the location itself. The matrix should contain only zeros 
on the diagonal, and all other elements should be positive.
Usage
dist_to_knn(x, k = min(10, nrow(x)))
Arguments
| x | A (square) distance matrix. Elements should be non-negative and the diagonal zeros, but this is not checked. | 
| k | The number of nearest neighbors, counting the location itself. | 
Value
A matrix of integers, row i containing the k nearest 
neighbors of location i, including itself.
Examples
x <- matrix(c(0, 0,
              1, 0,
              2, 1,
              0, 4,
              1, 3),
            ncol = 2, byrow = TRUE)
d <- dist(x, diag = TRUE, upper = TRUE)
dist_to_knn(d, k = 3)
Estimate baselines based on observed counts.
Description
Estimate the baselines (expected values) for the supplied counts.
Usage
estimate_baselines(counts, population = NULL)
Arguments
| counts | A matrix of observed counts. Rows indicate time (ordered from most recent) and columns indicate locations. | 
| population | A matrix or vector of populations for each location 
(optional). If a matrix, should be of the same dimensions as 
 | 
Value
A matrix of baselines of the same dimensions as counts.
Estimate variances based on observed counts.
Description
Estimate the variances for the supplied counts. It is assumed that variances are constant over time for each location.
Usage
estimate_variances(
  counts,
  baselines = NULL,
  population = NULL,
  constant_dim = 1
)
Arguments
| counts | A matrix of observed counts. Rows indicate time (ordered from most recent) and columns indicate locations. | 
| baselines | A matrix of the same dimensions as  | 
| population | A matrix or vector of populations for each location 
(optional). If a matrix, should be of the same dimensions as 
 | 
| constant_dim | An integer. If equal to 1, variances are assumed to be constant over time but different between locations. If equal to 2, variances are assumed to vary over time but at each time point be equal for all locations. | 
Value
A matrix of variances of the same dimensions as counts.
Estimate the parameters of a ZIP distribution.
Description
Heuristically estimate the ZIP distribution Poisson mean parameters and the structural zero probabilities for each location and time point. Assumes the structural zero probability is constant over time for each location.
Usage
estimate_zip_params(counts, population = NULL, min_p = 0.001, min_mu = 0.3)
Arguments
| counts | A matrix or vector of observed counts. Rows indicate time (ordered from most recent) and columns indicate locations. If a vector, the elements are assumed to be the counts for each location. | 
| population | A matrix or vector of populations for each location 
(optional). If a matrix, should be of the same dimensions as 
 | 
| min_p | The minimum value you think possible for the structural zero probability. | 
| min_mu | The mimum value you think possible for the Poisson mean parameter of the ZIP distribution (before adjusting for population size). | 
Value
A list with two elements:
- baselines
- A matrix of the same dimensions as - counts. If- countswas a vector, a matrix with 1 row will be returned.
- probs
- A matrix of the same dimensions as - counts. If- countswas a vector, a matrix with 1 row will be returned.
Computes the flexibly shaped zones as in Tango (2005).
Description
Given a matrix of k nearest neighbors and an adjacency matrix
for the locations involved, produces the set of flexibly shaped zones
as a list of integer vectors. The locations in these zones are all connected, 
in the sense that any location in the zone can be reached from another by 
traveling through adjacent locations within the zone.
Usage
flexible_zones(k_nearest, adjacency_matrix)
Arguments
| k_nearest | An integer matrix of the  | 
| adjacency_matrix | A boolean matrix, with element  | 
Value
A list of integer vectors.
References
Tango, T. & Takahashi, K. (2005), A flexibly shaped spatial scan statistic for detecting clusters, International Journal of Health Geographics 4(1).
Examples
A <- matrix(c(0,1,0,0,0,0,
              1,0,1,0,0,0,
              0,1,0,0,0,0,
              0,0,0,0,1,0,
              0,0,0,1,0,0,
              0,0,0,0,0,0), 
              nrow = 6, byrow = TRUE) == 1
nn <- matrix(as.integer(c(1,2,3,4,5,6,
                          2,1,3,4,5,6,
                          3,2,1,4,5,6,
                          4,5,1,6,3,2,
                          5,4,6,1,3,2,
                          6,5,4,1,3,2)),
                          nrow = 6, byrow = TRUE)
flexible_zones(nn, A)
Flip a matrix upside down
Description
Flip a matrix upside down
Usage
flipud(x)
Arguments
| x | A matrix | 
Value
A matrix, x with rows reversed.
Get indices of zero elements in a vector.
Description
Get indices of zero elements in a vector.
Usage
get_zero_indices(v)
Arguments
| v | An integer vector. | 
Value
A vector with the indices of elements equal to zero in v.
Indices start at zero.
Extract a zone from the set of all zones.
Description
Extract zone number n from the set of all zones.
Usage
get_zone(n, zones)
Arguments
| n | An integer; the number of the zone you wish to retrieve. | 
| zones | A list of integer vectors, representing the set of all zones. | 
Value
An integer vector.
Examples
zones <- list(1L, 2L, 3L, 1:2, c(1L, 3L), c(2L, 3L))
get_zone(4, zones)
Calculate the Gumbel p-value for a scan statistic.
Description
Given an observed scan statistic \lambda^* and a vector of replicate 
scan statistics \lambda_i, i=1,\ldots,R, fit a Gumbel 
distribution to the replicates and calculate a p-value for the observed
statistic based on the fitted distribution.
 \frac{1 + \sum_{i=1}^R \mathrm{I}(\lambda_i > \lambda^*)}{1 + R}
The function is vectorized, so multiple p-values can be calculated if
several scan statistics (e.g. statistics from secondary clusters) are 
supplied.
Usage
gumbel_pvalue(observed, replicates, method = "ML", ...)
Arguments
| observed | A scalar containing the observed value of the scan statistic, or a vector of observed values from secondary clusters. | 
| replicates | A vector of Monte Carlo replicates of the scan statistic. | 
| method | Either "ML", for maximum likelihood, or "MoM", for method of moments. | 
| ... | Additional arguments passed to  | 
Value
The p-value or p-values corresponding to the observed 
scan statistic(s).
Is the relative error between two numbers is less than the given tolerance?
Description
Given two consecutive numbers in a sequence, return TRUE if the
relative change is positive but less than the given tolerance.
Usage
has_converged(current, previous, tol = 0.01)
Arguments
| current | A scalar; the most recent value of the sequence. | 
| previous | A scalar; the second most recent value of the sequence, or a reference value. | 
| tol | The tolerance, a positive scalar near zero. | 
Return a set of the location and its neighbors if they are connected, else return the empty set.
Description
If the location and its neighbors, not including itself, are connected, then return the set containing the location and its neighbors; otherwise, return the empty set
Usage
if_connected(distinct_neighbors, location, adjacency_matrix)
Arguments
| distinct_neighbors | A  | 
| location | A location, preferably given as an integer. | 
| adjacency_matrix | A boolean matrix, with element  | 
Value
A set of the given location and the neighbors if they are
connected, else returns the empty set.
Returns TRUE if the neighboring locations are connected to the given location, FALSE if not.
Description
Returns TRUE if the neighboring locations are connected to the given location, FALSE if not.
Usage
is_connected(neighbor_locations, location, adjacency_matrix)
Arguments
| neighbor_locations | A  | 
| location | A location, preferably given as an integer. | 
| adjacency_matrix | A boolean matrix, with element  | 
Value
Boolean: is the neighbors connected to the given location?
Find the increasing subsets of k nearest neighbors for all locations.
Description
Returns the set of increasing nearest neighbor sets for all locations, as
a list of integer vectors. That is, for each location the list returned 
contains one vector containing the location itself, another containing the 
location and its nearest neighbor, and so on, up to the vector containing the 
location and its k-1 nearest neighbors.
Usage
knn_zones(k_nearest)
Arguments
| k_nearest | An integer matrix of with  | 
Value
A list of integer vectors.
Examples
nn <- matrix(c(1L, 2L, 4L, 3L, 5L,
               2L, 1L, 3L, 4L, 5L, 
               3L, 2L, 4L, 1L, 5L,
               4L, 1L, 2L, 3L, 5L,
               5L, 3L, 4L, 2L, 1L),
               ncol = 5, byrow = TRUE)
knn_zones(nn[, 1:3])
Convert a matrix to a data frame.
Description
Convert a matrix to a data frame with columns time, location, and one more containing the elements of the input matrix.
Usage
matrix_to_df(mat, name, locations = NULL, times = NULL)
Arguments
| mat | A matrix. | 
| name | The name of the third column in the output matrix. | 
| locations | If not  | 
| times | If not  | 
Value
A matrix with columns time, location, name, where name
is specified in the input.
Calculate the Monte Carlo p-value for a scan statistic.
Description
Given an observed scan statistic \lambda^* and a vector of replicate 
scan statistics \lambda_i, i=1,\ldots,R, calculate the Monte 
Carlo p-value as
 \frac{1 + \sum_{i=1}^R \mathrm{I}(\lambda_i > \lambda^*)}{1 + R}
The function is vectorized, so multiple p-values can be calculated if
several scan statistics (e.g. statistics from secondary clusters) are 
supplied.
Usage
mc_pvalue(observed, replicates)
Arguments
| observed | A scalar containing the observed value of the scan statistic, or a vector of observed values from secondary clusters. | 
| replicates | A vector of Monte Carlo replicates of the scan statistic. | 
Value
The p-value or p-values corresponding to the observed 
scan statistic(s).
Permute the entries of the matrix, preserving row and column marginals.
Description
Permute the entries of the matrix, preserving row and column marginals.
Usage
permute_matrix(A)
Arguments
| A | An integer matrix. | 
Value
An integer matrix.
Creates a set of all non-empty subsets of the integers from 1 to n.
Description
Creates a list of all 2^(n-1) non-empty subsets of the integers from 1 
to n.
Usage
powerset_zones(n)
Arguments
| n | An integer larger than 0. | 
Value
A list of integer vectors.
Print a scanstatistic object.
Description
Prints a scanstatistic object and returns it invisibly.
Usage
## S3 method for class 'scanstatistic'
print(x, ...)
Arguments
| x | A an object of class  | 
| ... | Further arguments passed to or from other methods. | 
Value
x, invisibly
Run a scan statistic analysis.
Description
Run a scan statistic analysis with the given scan statistic and arguments.
Usage
run_scan(scanstat, args, gumbel = FALSE)
Arguments
| scanstat | A scan statistic function. | 
| args | A named list of arguments to be passed to  | 
| gumbel | Logical: should a Gumbel P-value be calculated? Default is
 | 
Value
A list with components
- observed
- The table of observed statistics. 
- simulated
- The table of simulated statistics. 
- MC_pvalue
- The Monte Carlo P-value of the scan statistic. 
- Gumbel_pvalue
- The Gumbel P-value of the scan statistic. 
Calculate the negative binomial bayesian scan statistic..
Description
Calculate the "Bayesian Spatial Scan Statistic" by Neill et al. (2006), adapted to a spatio-temporal setting. The scan statistic assumes that, given the relative risk, the data follows a Poisson distribution. The relative risk is in turn assigned a Gamma distribution prior, yielding a negative binomial marginal distribution for the counts under the null hypothesis. Under the alternative hypothesis, the
Usage
scan_bayes_negbin(
  counts,
  zones,
  baselines = NULL,
  population = NULL,
  outbreak_prob = 0.05,
  alpha_null = 1,
  beta_null = 1,
  alpha_alt = alpha_null,
  beta_alt = beta_null,
  inc_values = seq(1, 3, by = 0.1),
  inc_probs = 1
)
Arguments
| counts | Either: 
 | 
| zones | A list of integer vectors. Each vector corresponds to a single zone; its elements are the numbers of the locations in that zone. | 
| baselines | Optional. A matrix of the same dimensions as  | 
| population | Optional. A matrix or vector of populations for each 
location. Not needed if  | 
| outbreak_prob | A scalar; the probability of an outbreak (at any time, any place). Defaults to 0.05. | 
| alpha_null | A scalar; the shape parameter for the gamma distribution under the null hypothesis of no anomaly. Defaults to 1. | 
| beta_null | A scalar; the scale parameter for the gamma distribution under the null hypothesis of no anomaly. Defaults to 1. | 
| alpha_alt | A scalar; the shape parameter for the gamma distribution
under the alternative hypothesis of an anomaly. Defaults to the same value
as  | 
| beta_alt | A scalar; the scale parameter for the gamma distribution
under the alternative hypothesis of an anomaly. Defaults to the same value
as  | 
| inc_values | A vector of possible values for the increase in the mean (and variance) of an anomalous count. Defaults to evenly spaced values between 1 and 3, with a difference of 0.1 between consecutive values. | 
| inc_probs | A vector of the prior probabilities of each value in 
 | 
Value
A list which, in addition to the information about the type of scan
statistic, has the following components: priors (list), 
posteriors (list), MLC (list) and marginal_data_prob 
(scalar). The list MLC has elements
- zone
- The number of the spatial zone of the most likely cluster (MLC). 
- duration
- The most likely event duration. 
- log_posterior
- The posterior log probability that an event is ongoing in the MLC. 
- log_bayes_factor
- The logarithm of the Bayes factor for the MLC. 
- posterior
- The posterior probability that an event is ongoing in the MLC. 
- locations
- The locations involved in the MLC. 
The list priors has elements
- null_prior
- The prior probability of no anomaly. 
- alt_prior
- The prior probability of an anomaly. 
- inc_prior
- A vectorof prior probabilities of each value in the argument - inc_values.
- window_prior
- The prior probability of an outbreak in any of the space-time windows. 
The list posteriors has elements
- null_posterior
- The posterior probability of no anomaly. 
- alt_posterior
- The posterior probability of an anomaly. 
- inc_posterior
- A data frame with columns - inc_valuesand- inc_posterior.
- window_posteriors
- A data frame with columns - zone,- duration,- log_posteriorand- log_bayes_factor, each row corresponding to a space-time window.
- space_time_posteriors
- A matrix with the posterior anomaly probability of each location-time combination. 
- location_posteriors
- A vector with the posterior probability of an anomaly at each location. 
References
Neill, D. B., Moore, A. W., Cooper, G. F. (2006). A Bayesian Spatial Scan Statistic. Advances in Neural Information Processing Systems 18.
Examples
set.seed(1)
# Create location coordinates, calculate nearest neighbors, and create zones
n_locs <- 50
max_duration <- 5
n_total <- n_locs * max_duration
geo <- matrix(rnorm(n_locs * 2), n_locs, 2)
knn_mat <- coords_to_knn(geo, 15)
zones <- knn_zones(knn_mat)
# Simulate data
baselines <- matrix(rexp(n_total, 1/5), max_duration, n_locs)
counts <- matrix(rpois(n_total, as.vector(baselines)), max_duration, n_locs)
# Inject outbreak/event/anomaly
ob_dur <- 3
ob_cols <- zones[[10]]
ob_rows <- max_duration + 1 - seq_len(ob_dur)
counts[ob_rows, ob_cols] <- matrix(
  rpois(ob_dur * length(ob_cols), 2 * baselines[ob_rows, ob_cols]), 
  length(ob_rows), length(ob_cols))
res <- scan_bayes_negbin(counts = counts,
                         zones = zones,
                         baselines = baselines)
Calculate the "Bayesian Spatial Scan Statistic" by Neill et al. (2006).
Description
Calculate the "Bayesian Spatial Scan Statistic" by Neill et al. (2006), adapted to a spatio-temporal setting. The scan statistic assumes that, given the relative risk, the data follows a Poisson distribution. The relative risk is in turn assigned a Gamma distribution prior, yielding a negative binomial marginal distribution for the counts.
Usage
scan_bayes_negbin_cpp(
  counts,
  baselines,
  zones,
  zone_lengths,
  outbreak_prob,
  alpha_null,
  beta_null,
  alpha_alt,
  beta_alt,
  inc_values,
  inc_probs
)
Arguments
| counts | An integer matrix (most recent timepoint in first row). | 
| baselines | A matrix with positive entries (most recent timepoint in first row). | 
| zones | An integer vector (all zones concatenated; locations indexed from 0 and up). | 
| zone_lengths | An integer vector. | 
| outbreak_prob | A scalar; the probability of an outbreak (at any time, any place). | 
| alpha_null | A scalar; the shape parameter for the gamma distribution under the null hypothesis of no anomaly. | 
| beta_null | A scalar; the scale parameter for the gamma distribution under the null hypothesis of no anomaly. | 
| alpha_alt | A scalar; the shape parameter for the gamma distribution under the alternative hypothesis of an anomaly. | 
| beta_alt | A scalar; the scale parameter for the gamma distribution under the alternative hypothesis of an anomaly. | 
| inc_values | A vector of possible values for the increase in the mean (and variance) of an anomalous count. | 
| inc_probs | A vector of the prior probabilities of each value in 
 | 
Value
A list with elements priors (list), posteriors (list), 
and marginal_data_prob (scalar). The list priors has 
elements
- null_prior
- The prior probability of no anomaly. 
- alt_prior
- The prior probability of an anomaly. 
- inc_prior
- A vector (matrix with 1 row) of prior probabilities of each value in the argument - m_values.
- window_prior
- The prior probability of an outbreak in any of the space-time windows. 
The list posteriors has elements
- null_posterior
- The posterior probability of no anomaly. 
- alt_posterior
- The posterior probability of an anomaly. 
- inc_posterior
- A data frame with columns - inc_valuesand- inc_posterior.
- window_posteriors
- A data frame with columns - zone,- duration,- log_posteriorand- log_bayes_factor, each row corresponding to a space-time window.
- space_time_posteriors
- A matrix with the posterior anomaly probability of each location-time combination. 
- location_posteriors
- A vector (matrix with 1 row) with the posterior probability of an anomaly at each location. 
Calculate the expectation-based negative binomial scan statistic.
Description
Calculate the expectation-based negative binomial scan statistic devised by Tango et al. (2011).
Usage
scan_eb_negbin(
  counts,
  zones,
  baselines = NULL,
  thetas = 1,
  type = c("hotspot", "emerging"),
  n_mcsim = 0,
  gumbel = FALSE,
  max_only = FALSE
)
Arguments
| counts | Either: 
 | 
| zones | A list of integer vectors. Each vector corresponds to a single zone; its elements are the numbers of the locations in that zone. | 
| baselines | Optional. A matrix of the same dimensions as  | 
| thetas | Optional. A matrix of the same dimensions as  | 
| type | A string, either "hotspot" or "emerging". If "hotspot", the relative risk is assumed to be fixed over time. If "emerging", the relative risk is assumed to increase with the duration of the outbreak. | 
| n_mcsim | A non-negative integer; the number of replicate scan
statistics to generate in order to calculate a  | 
| gumbel | Logical: should a Gumbel P-value be calculated? Default is
 | 
| max_only | Boolean. If  | 
Value
A list which, in addition to the information about the type of scan statistic, has the following components:
- MLC
- A list containing the number of the zone of the most likely cluster (MLC), the locations in that zone, the duration of the MLC, and the calculated score. In order, the elements of this list are named - zone_number, locations, duration, score.
- observed
- A data frame containing, for each combination of zone and duration investigated, the zone number, duration, and score. The table is sorted by score with the top-scoring location on top. If - max_only = TRUE, only contains a single row corresponding to the MLC.
- replicates
- A data frame of the Monte Carlo replicates of the scan statistic (if any), and the corresponding zones and durations. 
- MC_pvalue
- The Monte Carlo - P-value.
- Gumbel_pvalue
- A - P-value obtained by fitting a Gumbel distribution to the replicate scan statistics.
- n_zones
- The number of zones scanned. 
- n_locations
- The number of locations. 
- max_duration
- The maximum duration considered. 
- n_mcsim
- The number of Monte Carlo replicates made. 
References
Tango, T., Takahashi, K. & Kohriyama, K. (2011), A space-time scan statistic for detecting emerging outbreaks, Biometrics 67(1), 106–115.
Examples
set.seed(1)
# Create location coordinates, calculate nearest neighbors, and create zones
n_locs <- 50
max_duration <- 5
n_total <- n_locs * max_duration
geo <- matrix(rnorm(n_locs * 2), n_locs, 2)
knn_mat <- coords_to_knn(geo, 15)
zones <- knn_zones(knn_mat)
# Simulate data
 baselines <- matrix(rexp(n_total, 1/5), max_duration, n_locs)
 thetas <- matrix(runif(n_total, 0.05, 3), max_duration, n_locs)
 counts <- matrix(rnbinom(n_total,  mu = baselines,  size = thetas), 
                  max_duration, n_locs)
# Inject outbreak/event/anomaly
ob_dur <- 3
ob_cols <- zones[[10]]
ob_rows <- max_duration + 1 - seq_len(ob_dur)
counts[ob_rows, ob_cols] <- matrix(
  rnbinom(ob_dur * length(ob_cols), 
          mu = 2 * baselines[ob_rows, ob_cols],
          size = thetas[ob_rows, ob_cols]),
  length(ob_rows), length(ob_cols))
res <- scan_eb_negbin(counts = counts,
                      zones = zones,
                      baselines = baselines,
                      thetas = thetas,
                      type = "hotspot",
                      n_mcsim = 99,
                      max_only = FALSE)
Calculate the expectation-based negative binomial scan statistic.
Description
Calculate the expectation-based negative binomial scan statistic and Monte Carlo replicates.
Usage
scan_eb_negbin_cpp(
  counts,
  baselines,
  overdisp,
  zones,
  zone_lengths,
  store_everything,
  num_mcsim,
  score_hotspot
)
Arguments
| counts | Integer matrix (most recent timepoint in first row) | 
| baselines | Matrix (most recent timepoint in first row) | 
| overdisp | Matrix (most recent timepoint in first row) | 
| zones | Integer vector (all zones concatenated; locations indexed from 0 and up) | 
| zone_lengths | Integer vector | 
| store_everything | Boolean | 
| num_mcsim | Integer | 
| score_hotspot | Boolean | 
Value
A list with elements observed and simulated, each 
being a data frame with columns:
- zone
- The top-scoring zone (spatial component of MLC). 
- duration
- The corresponding duration (time-length of MLC). 
- score
- The value of the loglihood ratio statistic (the scan statistic). 
- relrisk
- The estimated relative risk. 
- n_iter
- The number of iterations performed by the EM algorithm. 
Calculate the expectation-based Poisson scan statistic.
Description
Calculate the expectation-based Poisson scan statistic devised by Neill et al. (2005).
Usage
scan_eb_poisson(
  counts,
  zones,
  baselines = NULL,
  population = NULL,
  n_mcsim = 0,
  gumbel = FALSE,
  max_only = FALSE
)
Arguments
| counts | Either: 
 | 
| zones | A list of integer vectors. Each vector corresponds to a single zone; its elements are the numbers of the locations in that zone. | 
| baselines | Optional. A matrix of the same dimensions as  | 
| population | Optional. A matrix or vector of populations for each 
location. Not needed if  | 
| n_mcsim | A non-negative integer; the number of replicate scan
statistics to generate in order to calculate a  | 
| gumbel | Logical: should a Gumbel P-value be calculated? Default is
 | 
| max_only | Boolean. If  | 
Value
A list which, in addition to the information about the type of scan statistic, has the following components:
- MLC
- A list containing the number of the zone of the most likely cluster (MLC), the locations in that zone, the duration of the MLC, the calculated score, and the relative risk. In order, the elements of this list are named - zone_number, locations, duration, score, relative_risk.
- observed
- A data frame containing, for each combination of zone and duration investigated, the zone number, duration, score, relative risk. The table is sorted by score with the top-scoring location on top. If - max_only = TRUE, only contains a single row corresponding to the MLC.
- replicates
- A data frame of the Monte Carlo replicates of the scan statistic (if any), and the corresponding zones and durations. 
- MC_pvalue
- The Monte Carlo - P-value.
- Gumbel_pvalue
- A - P-value obtained by fitting a Gumbel distribution to the replicate scan statistics.
- n_zones
- The number of zones scanned. 
- n_locations
- The number of locations. 
- max_duration
- The maximum duration considered. 
- n_mcsim
- The number of Monte Carlo replicates made. 
References
Neill, D. B., Moore, A. W., Sabhnani, M. and Daniel, K. (2005). Detection of emerging space-time clusters. Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD ’05, 218.
Examples
set.seed(1)
# Create location coordinates, calculate nearest neighbors, and create zones
n_locs <- 50
max_duration <- 5
n_total <- n_locs * max_duration
geo <- matrix(rnorm(n_locs * 2), n_locs, 2)
knn_mat <- coords_to_knn(geo, 15)
zones <- knn_zones(knn_mat)
# Simulate data
baselines <- matrix(rexp(n_total, 1/5), max_duration, n_locs)
counts <- matrix(rpois(n_total, as.vector(baselines)), max_duration, n_locs)
# Inject outbreak/event/anomaly
ob_dur <- 3
ob_cols <- zones[[10]]
ob_rows <- max_duration + 1 - seq_len(ob_dur)
counts[ob_rows, ob_cols] <- matrix(
  rpois(ob_dur * length(ob_cols), 2 * baselines[ob_rows, ob_cols]), 
  length(ob_rows), length(ob_cols))
res <- scan_eb_poisson(counts = counts,
                       zones = zones,
                       baselines = baselines,
                       n_mcsim = 99,
                       max_only = FALSE)
Calculate the expecation-based Poisson scan statistic.
Description
Calculate the expectation-based Poisson scan statistic and Monte Carlo replicates.
Usage
scan_eb_poisson_cpp(
  counts,
  baselines,
  zones,
  zone_lengths,
  store_everything,
  num_mcsim
)
Arguments
| counts | An integer matrix (most recent timepoint in first row). | 
| baselines | A matrix with positive entries (most recent timepoint in first row). | 
| zones | An integer vector (all zones concatenated; locations indexed from 0 and up). | 
| zone_lengths | An integer vector. | 
| store_everything | A boolean. | 
| num_mcsim | An integer. | 
Value
A list with elements observed and simulated, each 
being a data frame with columns:
- zone
- The top-scoring zone (spatial component of MLC). 
- duration
- The corresponding duration (time-length of MLC). 
- score
- The value of the loglihood ratio statistic (the scan statistic). 
- relrisk_in
- The estimated relative risk inside. 
- relrisk_in
- The estimated relative risk outside. 
Calculate the expectation-based ZIP scan statistic.
Description
Calculates the expectation-based scan statistic. See details below.
Usage
scan_eb_zip(
  counts,
  zones,
  baselines = NULL,
  probs = NULL,
  population = NULL,
  n_mcsim = 0,
  gumbel = FALSE,
  max_only = FALSE,
  rel_tol = 0.001
)
Arguments
| counts | Either: 
 | 
| zones | A list of integer vectors. Each vector corresponds to a single zone; its elements are the numbers of the locations in that zone. | 
| baselines | Optional. A matrix of the same dimensions as  | 
| probs | Optional. A matrix of the same dimensions as  | 
| population | Optional. A matrix or vector of populations for each 
location. Only needed if  | 
| n_mcsim | A non-negative integer; the number of replicate scan
statistics to generate in order to calculate a  | 
| gumbel | Logical: should a Gumbel P-value be calculated? Default is
 | 
| max_only | Boolean. If  | 
| rel_tol | A positive scalar. If the relative change in the incomplete information likelihood is less than this value, then the EM algorithm is deemed to have converged. | 
Details
For the expectation-based zero-inflated Poisson scan statistic
(Allévius & Höhle 2017), the null hypothesis of no anomaly holds that 
the count observed at each location i and duration t (the 
number of time periods before present) has a zero-inflated Poisson 
distribution with expected value parameter \mu_{it} and structural 
zero probability p_{it}:
     H_0 : Y_{it} \sim \textrm{ZIP}(\mu_{it}, p_{it}).
   
This holds for all locations i = 1, \ldots, m and all durations
t = 1, \ldots,T, with T being the maximum duration considered.
Under the alternative hypothesis, there is a space-time window W
consisting of a spatial zone Z \subset \{1, \ldots, m\} and a time
window D \subseteq \{1, \ldots, T\} such that the counts in that
window have their Poisson expected value parameters inflated by a factor
q_W > 1 compared to the null hypothesis:
   H_1 : Y_{it} \sim \textrm{ZIP}(q_W \mu_{it}, p_{it}), ~~(i,t) \in W.
   
For locations and durations outside of this window, counts are assumed to
be distributed as under the null hypothesis. The sets Z considered
are those specified in the argument zones, while the maximum
duration T is taken as the maximum value in the column
duration of the input table.
For each space-time window W considered, (the log of) a likelihood
ratio is computed using the distributions under the alternative and null
hypotheses, and the expectation-based Poisson scan statistic is calculated
as the maximum of these quantities over all space-time windows. The
expectation-maximization (EM) algorithm is used to obtain maximum
likelihood estimates.
Value
A list which, in addition to the information about the type of scan statistic, has the following components:
- MLC
- A list containing the number of the zone of the most likely cluster (MLC), the locations in that zone, the duration of the MLC, the calculated score, the relative risk, and the number of iterations until convergence for the EM algorithm. In order, the elements of this list are named - zone_number, locations, duration, score, relative_risk, n_iter.
- observed
- A data frame containing, for each combination of zone and duration investigated, the zone number, duration, score, relative risk, number of EM iterations. The table is sorted by score with the top-scoring location on top. If - max_only = TRUE, only contains a single row corresponding to the MLC.
- replicates
- A data frame of the Monte Carlo replicates of the scan statistic (if any), and the corresponding zones and durations. 
- MC_pvalue
- The Monte Carlo - P-value.
- Gumbel_pvalue
- A - P-value obtained by fitting a Gumbel distribution to the replicate scan statistics.
- n_zones
- The number of zones scanned. 
- n_locations
- The number of locations. 
- max_duration
- The maximum duration considered. 
- n_mcsim
- The number of Monte Carlo replicates made. 
References
Allévius, B. and Höhle, M, An expectation-based space-time scan statistic for ZIP-distributed data, (Technical report), Link to PDF.
Examples
if (require("gamlss.dist")) {
  set.seed(1)
  # Create location coordinates, calculate nearest neighbors, and create zones
  n_locs <- 50
  max_duration <- 5
  n_total <- n_locs * max_duration
  geo <- matrix(rnorm(n_locs * 2), n_locs, 2)
  knn_mat <- coords_to_knn(geo, 15)
  zones <- knn_zones(knn_mat)
  
  # Simulate data
  baselines <- matrix(rexp(n_total, 1/5), max_duration, n_locs)
  probs <- matrix(runif(n_total) / 4, max_duration, n_locs)
  counts <- matrix(gamlss.dist::rZIP(n_total, baselines, probs), 
                   max_duration, n_locs)
  
  # Inject outbreak/event/anomaly
  ob_dur <- 3
  ob_cols <- zones[[10]]
  ob_rows <- max_duration + 1 - seq_len(ob_dur)
  counts[ob_rows, ob_cols] <- gamlss.dist::rZIP(
    ob_dur * length(ob_cols), 2 * baselines[ob_rows, ob_cols],
    probs[ob_rows, ob_cols])
  res <- scan_eb_zip(counts = counts,
                     zones = zones,
                     baselines = baselines,
                     probs = probs,
                     n_mcsim = 9,
                     max_only = FALSE,
                     rel_tol = 1e-3)
}
Calculate the highest-value EB ZIP loglihood ratio statistic.
Description
Calculate the expectation-based ZIP loglihood ratio statistic for each zone and duration, but only keep the zone and duration with the highest value (the MLC). The estimate of the relative risk is also calculated, along with the number of iterations the EM algorithm performed.
Usage
scan_eb_zip_cpp(
  counts,
  baselines,
  probs,
  zones,
  zone_lengths,
  rel_tol,
  store_everything,
  num_mcsim
)
Arguments
| counts | matrix (most recent timepoint in first row) | 
| baselines | matrix (most recent timepoint in first row) | 
| probs | matrix (most recent timepoint in first row) | 
| zones | integer vector (all zones concatenated; locations indexed from 0 and up) | 
| zone_lengths | integer vector | 
| rel_tol | double | 
| store_everything | boolean | 
| num_mcsim | int | 
Value
A list with elements observed and simulated, each 
being a data frame with columns:
- zone
- The top-scoring zone (spatial component of MLC). 
- duration
- The corresponding duration (time-length of MLC). 
- score
- The value of the loglihood ratio statistic (the scan statistic). 
- relrisk
- The estimated relative risk. 
- n_iter
- The number of iterations performed by the EM algorithm. 
Calculate the space-time permutation scan statistic.
Description
Calculate the space-time permutation scan statistic (Kulldorff 2005) and Monte Carloo replicates.
Usage
scan_pb_perm_cpp(
  counts,
  baselines,
  zones,
  zone_lengths,
  store_everything,
  num_mcsim
)
Arguments
| counts | An integer matrix (most recent timepoint in first row). | 
| baselines | A matrix with positive entries (most recent timepoint in first row). | 
| zones | An integer vector (all zones concatenated; locations indexed from 0 and up) | 
| zone_lengths | An integer vector. | 
| store_everything | A boolean. | 
| num_mcsim | An integer. | 
Value
A list with elements observed and simulated, each 
being a data frame with columns:
- zone
- The top-scoring zone (spatial component of MLC). 
- duration
- The corresponding duration (time-length of MLC). 
- score
- The value of the loglihood ratio statistic (the scan statistic). 
- relrisk_in
- The estimated relative risk inside. 
- relrisk_in
- The estimated relative risk outside. 
Calculate the population-based Poisson scan statistic.
Description
Calculate the population-based Poisson scan statistic devised by Kulldorff (1997, 2001).
Usage
scan_pb_poisson(
  counts,
  zones,
  population = NULL,
  n_mcsim = 0,
  gumbel = FALSE,
  max_only = FALSE
)
Arguments
| counts | Either: 
 | 
| zones | A list of integer vectors. Each vector corresponds to a single zone; its elements are the numbers of the locations in that zone. | 
| population | Optional. A matrix or vector of populations for each 
location and time point. Only needed if  | 
| n_mcsim | A non-negative integer; the number of replicate scan statistics to generate in order to calculate a P-value. | 
| gumbel | Logical: should a Gumbel P-value be calculated? Default is
 | 
| max_only | Boolean. If  | 
Value
A list which, in addition to the information about the type of scan statistic, has the following components:
- MLC
- A list containing the number of the zone of the most likely cluster (MLC), the locations in that zone, the duration of the MLC, the calculated score, and the relative risk inside and outside the cluster. In order, the elements of this list are named - zone_number, locations, duration, score, relrisk_in, relrisk_out.
- observed
- A data frame containing, for each combination of zone and duration investigated, the zone number, duration, score, relative risks. The table is sorted by score with the top-scoring location on top. If - max_only = TRUE, only contains a single row corresponding to the MLC.
- replicates
- A data frame of the Monte Carlo replicates of the scan statistic (if any), and the corresponding zones and durations. 
- MC_pvalue
- The Monte Carlo - P-value.
- Gumbel_pvalue
- A - P-value obtained by fitting a Gumbel distribution to the replicate scan statistics.
- n_zones
- The number of zones scanned. 
- n_locations
- The number of locations. 
- max_duration
- The maximum duration considered. 
- n_mcsim
- The number of Monte Carlo replicates made. 
References
Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics - Theory and Methods, 26, 1481–1496.
Kulldorff, M. (2001). Prospective time periodic geographical disease surveillance using a scan statistic. Journal of the Royal Statistical Society, Series A (Statistics in Society), 164, 61–72.
Examples
set.seed(1)
# Create location coordinates, calculate nearest neighbors, and create zones
n_locs <- 50
max_duration <- 5
n_total <- n_locs * max_duration
geo <- matrix(rnorm(n_locs * 2), n_locs, 2)
knn_mat <- coords_to_knn(geo, 15)
zones <- knn_zones(knn_mat)
# Simulate data
population <- matrix(rnorm(n_total, 100, 10), max_duration, n_locs)
counts <- matrix(rpois(n_total, as.vector(population) / 20), 
                 max_duration, n_locs)
# Inject outbreak/event/anomaly
ob_dur <- 3
ob_cols <- zones[[10]]
ob_rows <- max_duration + 1 - seq_len(ob_dur)
counts[ob_rows, ob_cols] <- matrix(
  rpois(ob_dur * length(ob_cols), 2 * population[ob_rows, ob_cols] / 20), 
  length(ob_rows), length(ob_cols))
res <- scan_pb_poisson(counts = counts,
                       zones = zones,
                       population = population,
                       n_mcsim = 99,
                       max_only = FALSE)
Calculate the population-based Poisson scan statistic.
Description
Calculate the population-based Poisson scan statistic and Monte Carlo replicates.
Usage
scan_pb_poisson_cpp(
  counts,
  baselines,
  zones,
  zone_lengths,
  store_everything,
  num_mcsim
)
Arguments
| counts | integer matrix (most recent timepoint in first row) | 
| baselines | matrix (most recent timepoint in first row) | 
| zones | integer vector (all zones concatenated; locations indexed from 0 and up) | 
| zone_lengths | integer vector | 
| store_everything | boolean | 
| num_mcsim | int | 
Value
A list with elements observed and simulated, each 
being a data frame with columns:
- zone
- The top-scoring zone (spatial component of MLC). 
- duration
- The corresponding duration (time-length of MLC). 
- score
- The value of the loglihood ratio statistic (the scan statistic). 
- relrisk_in
- The estimated relative risk inside. 
- relrisk_in
- The estimated relative risk outside. 
Calculate the space-time permutation scan statistic.
Description
Calculate the space-time permutation scan statistic devised by Kulldorff (2005).
Usage
scan_permutation(
  counts,
  zones,
  population = NULL,
  n_mcsim = 0,
  gumbel = FALSE,
  max_only = FALSE
)
Arguments
| counts | Either: 
 | 
| zones | A list of integer vectors. Each vector corresponds to a single zone; its elements are the numbers of the locations in that zone. | 
| population | Optional. A matrix or vector of populations for each 
location and time point. Only needed if  | 
| n_mcsim | A non-negative integer; the number of replicate scan statistics to generate in order to calculate a P-value. | 
| gumbel | Logical: should a Gumbel P-value be calculated? Default is
 | 
| max_only | Boolean. If  | 
Value
A list which, in addition to the information about the type of scan statistic, has the following components:
- MLC
- A list containing the number of the zone of the most likely cluster (MLC), the locations in that zone, the duration of the MLC, the calculated score, and the relative risk inside and outside the cluster. In order, the elements of this list are named - zone_number, locations, duration, score, relrisk_in, relrisk_out.
- observed
- A data frame containing, for each combination of zone and duration investigated, the zone number, duration, score, relative risks. The table is sorted by score with the top-scoring location on top. If - max_only = TRUE, only contains a single row corresponding to the MLC.
- replicates
- A data frame of the Monte Carlo replicates of the scan statistic (if any), and the corresponding zones and durations. 
- MC_pvalue
- The Monte Carlo - P-value.
- Gumbel_pvalue
- A - P-value obtained by fitting a Gumbel distribution to the replicate scan statistics.
- n_zones
- The number of zones scanned. 
- n_locations
- The number of locations. 
- max_duration
- The maximum duration considered. 
- n_mcsim
- The number of Monte Carlo replicates made. 
References
Kulldorff, M., Heffernan, R., Hartman, J., Assunção, R. M., Mostashari, F. (2005). A space-time permutation scan statistic for disease outbreak detection. PLoS Medicine, 2(3), 0216-0224.
Examples
set.seed(1)
# Create location coordinates, calculate nearest neighbors, and create zones
n_locs <- 50
max_duration <- 5
n_total <- n_locs * max_duration
geo <- matrix(rnorm(n_locs * 2), n_locs, 2)
knn_mat <- coords_to_knn(geo, 15)
zones <- knn_zones(knn_mat)
# Simulate data
population <- matrix(rnorm(n_total, 100, 10), max_duration, n_locs)
counts <- matrix(rpois(n_total, as.vector(population) / 20), 
                 max_duration, n_locs)
# Inject outbreak/event/anomaly
ob_dur <- 3
ob_cols <- zones[[10]]
ob_rows <- max_duration + 1 - seq_len(ob_dur)
counts[ob_rows, ob_cols] <- matrix(
  rpois(ob_dur * length(ob_cols), 2 * population[ob_rows, ob_cols] / 20), 
  length(ob_rows), length(ob_cols))
res <- scan_permutation(counts = counts,
                           zones = zones,
                           population = population,
                           n_mcsim = 99,
                           max_only = FALSE)
Score each location over zones and duration.
Description
For each location, compute the average of the statistic calculated for each space-time window that the location is included in, i.e. average the statistic over both zones and the maximum duration.
Usage
score_locations(x, zones)
Arguments
| x | An object of class  | 
| zones | A list of integer vectors. | 
Value
A data.table with the following columns:
- location
- The locations (as integers). 
- total_score
- For each location, the sum of all window statistics that the location appears in. 
- n_zones
- The number of spatial zones that the location appears in. 
- score
- The total score divided by the number of zones and the maximum duration. 
- relative_score
- The score divided by the maximum score. 
Examples
# Simple example
set.seed(1)
table <- data.frame(zone = 1:5, duration = 1, score = 5:1)
zones <- list(1:2, 1:3, 2:5, 4:5, c(1, 5))
x <- list(observed = table, n_locations = 5, max_duration = 1, n_zones = 5)
score_locations(x, zones)
Get the top (non-overlappig) clusters.
Description
Get the top k space-time clusters according to the statistic calculated
for each cluster (the maximum being the scan statistic). The default is to 
return the spatially non-overlapping clusters, i.e. those that do not have 
any locations in common.
Usage
top_clusters(
  x,
  zones,
  k = 5,
  overlapping = FALSE,
  gumbel = FALSE,
  alpha = NULL,
  ...
)
Arguments
| x | An object of class scanstatistics. | 
| zones | A list of integer vectors. | 
| k | An integer, the number of clusters to return. | 
| overlapping | Logical; should the top clusters be allowed to overlap in
the spatial dimension? The default is  | 
| gumbel | Logical; should a Gumbel P-value be calculated? The default is 
 | 
| alpha | A significance level, which if not  | 
| ... | Parameters passed to  | 
Value
A data frame with at most k rows, with columns 
zone, duration, score and possibly MC_pvalue, Gumbel_pvalue
and critical_value.
Examples
set.seed(1)
counts <- matrix(rpois(15, 3), 3, 5)
zones <- list(1:2, 1:3, 2:5, c(1, 3), 4:5, c(1, 5))
scanres <- scan_permutation(counts, zones, n_mcsim = 5)
top_clusters(scanres, zones, k = 4, overlapping = FALSE)