This vignette serves as a quick guide on how to use the ‘offlineChange’ R package to detect multiple change points from offline data, based on the multi-window approach proposed by Ding et al.
Suppose there is a sequence of one-dimentional dependent data, and the true number of change points is unknown, then the following function can be used to return the ranges that are most likely to contain the true change points. A list of window sizes (window_list) is specified by the user. Each window size should be less than the number of observations. The purpose of these windows, as explained in the reference paper, are used to divide the original data into segments so that the data can be transformed. The largest number of changes (point_max) should be specified by the user as well.
library(offlineChange)
# Data
N <- 1000
N1 <- floor(0.1*N)
N2 <- floor(0.3*N)
a1 <- c(0.8, -0.3); c1 <- 0
a2 <- c(-0.5, 0.1); c2 <- 0
a3 <- c(0.5, -0.5); c3 <- 0
y <- rep(0,N)
L<-2
y[1:L] <- rnorm(L)
for (n in (L+1):N){
if (n <= N1) {
y[n] <- y[(n-1):(n-L)] %*% a1 + c1 + rnorm(1)
} else if (n <= (N1+N2)) {
y[n] <- y[(n-1):(n-L)] %*% a2 + c2 + rnorm(1)
}
else {
y[n] <- y[(n-1):(n-L)] %*% a3 + c3 + rnorm(1)
}
}
result <- MultiWindow(y,
window_list = c(100,50,20,10,5),
point_max = 5)
The returned results contain the number of peak ranges and location of peak ranges:
The users can also modify some other default arguments:
result <- MultiWindow(y,
window_list = c(100, 50, 20, 10, 5),
point_max = 5,
prior_range = NULL,
get_mle = GetMle,
penalty = c("bic","aic"),
seg_min = 1,
num_init = NULL,
tolerance = 1,
cpp = TRUE,
ret_score = FALSE
)
Based on the peak ranges returned, users can use the following function to get exact change points:
result <- MultiWindow(y,
window_list = c(100,50,20,10,5),
point_max = 5)
RangeToPoint(y,
n_peak_range = result$n_peak_range,
peak_range = result$peak_range)
Based on the peak ranges returned, users can also plot them:
result <- MultiWindow(y,
window_list = c(100,50,20,10,5),
point_max = 5)
ChangePointsPlot(y,result,main="plot of change points")
Or, users can plot the score that represents how likely the range contains change points.
Suppose there is a sequence of one-dimentional dependent data, and the user already know prior ranges that change points should be within. Then the following function can be used to return the (narrower) ranges that are most likely to contain the true change points, with much less computational cost compared with the previous method.
And the users can also change some default arguments. In particular, ‘get_mle’ is the data transformation method that can be provided by users. Typically, it is chosen to be the function that input data and output an estimate of unknown parameters assuming that the data is generated from a parametric family.
result <- MultiWindow(y,
window_list = c(100, 50, 20, 10, 5),
prior_range = list(c(30,200), c(220,400)),
get_mle = GetMle,
tolerance = 1)
Based on the peak ranges returned, users can use the following function to get exact change points:
Suppose there is a sequence of independent data with dimension L+1, and the true number of change points is unknown, then the following function can be used to return the change points. Like before, the largest number of changes (point_max) should be specified by the user.
# Data
a <- matrix(rnorm(40,mean=-1,sd=1), nrow=20, ncol=2)
b <- matrix(rnorm(120,mean=0,sd=1), nrow=60, ncol=2)
c <- matrix(rnorm(40,mean=1,sd=1), nrow=20, ncol=2)
x <- rbind(a,b,c)
result <- ChangePoints(x, point_max = 5)
And the users can also modify some default arguments:
Suppose there is a sequence of independent multivariate data, and the true number of change points is already known, then the following function can be used to return the change points. Here, K is the number of change points instead of the number of segments.
And the users can also change the suggested setting of arguments:
Suppose there is a sequence of independent multivariate data, and the user already know prior ranges that change points should be within, then the following function can be used to return change points, and the prior knowledge of ranges of change points can make the cumputation more efficient and the return change points more accurate.
l1 <- c(15,25)
l2 <- c(75,100)
prior_range_x <- list(l1, l2)
result <- PriorRangeOrderKmeans(x, prior_range_x = list(l1,l2))
And the users can also modify some default arguments: