Simulation Models

Oluwasegun Ojo

library(fdaoutlier)

The following are simulation models included in the fdaoutlier package. Some of these models were curated from research work related to functional depths and outlier detection for functional data. This documents presents the model equations as well as their corresponding functions and parameters in fdaoutlier. The parameters of the fdaoutlier functions have been set to reasonable default values for ease of use.

Model 1

This is a typical magnitude model in which outliers are shifted from the ‘normal’ non-outlying observations. The main model is of the form:

\[X_i(t) = \mu t + e_i(t),\] and the contamination model model is of the form:

\[X_i(t) = \mu t + qk_i + e_i(t)\] where:

This model can be accessed with the simulation_model1() function in fdaoutlier.

library(fdaoutlier)
dtss <- simulation_model1(n = 100, p = 50, outlier_rate = .1,
                          seed = 50, plot = F)

The returned object is a list containing a matrix of the data and a vector of the indices of the true outliers:

dim(dtss$data)
#> [1] 100  50
dtss$true_outliers
#>  [1] 11 14 20 43 53 70 79 81 83 96

The simulated data can be tuned using additional parameters to simulation_model1(). The following parameters modify the data generated by simulation_model1():

Additional plotting parameters allows for modifying the plot title (plot_title), the font size of the title (title_cex), toggle on/off the display of the legend (show_legend), y-axis label (ylabel) and x-axis label (xlabel).

Model 2

This model generates non-persistent magnitude outliers, i.e., the outliers are magnitude outliers for only a portion of the domain of the functional data. The main model is of the form: \[X_i(t) = \mu t + e_i(t),\] with contamination model of the form: \[X_i(t) = \mu t + qk_iI_{T_i \le t\le T_i+l } + e_i(t)\] where:

A call to simulation_model2() generates data from this model:

dtss <- simulation_model2(n = 100, p = 50, outlier_rate = .1,
                          seed = 50, plot = F)

Additional parameters of simulation_model3() to which arguments can be passed are:

Additional plotting parameters listed for simulation_model1() also applies.

Model 3

This model generates outliers that are magnitude outliers for a part of the domain. The main model is of the form: \[X_i(t) = \mu t + e_i(t),\] with contamination model of the form: \[X_i(t) = \mu t + qk_iI_{T_i \le t } + e_i(t)\] where:

A call to simulation_model3() generates data from this model:

dtss <- simulation_model3(n = 100, p = 50, outlier_rate = .1,
                          seed = 50, plot = F)

Additional parameters of simulation_model3() to which arguments can be passed are:

Additional plotting parameters listed for simulation_model1() also applies.

Model 4

This models generates outliers defined on the reversed interval of the main model. The main model is of the form: \[X_i(t) = \mu t(1 - t)^m + e_i(t),\] with contamination model of the form: \[X_i(t) = \mu(1 - t)t^m + e_i(t)\] where:

A call to simulation_model4() generates data from this model:

dtss <- simulation_model4(n = 100, p = 50, outlier_rate = .1,
                          seed = 50, plot = F)

Additional parameters of simulation_model4() to which arguments can be passed are:

Additional plotting parameters listed for simulation_model1() also applies.

Model 5

This models generates shape outliers with a different covariance structure from that of the main model. The main model is of the form: \[X_i(t) = \mu t + e_i(t),\] with contamination model of the form: \[X_i(t) = \mu t + \tilde{e}_i(t),\] where:

A call to simulation_model5() generates data from this model:

dtss <- simulation_model5(n = 100, p = 50, outlier_rate = .1,
                          seed = 50, plot = F)

Additional parameters of simulation_model5() to which arguments can be passed are:

Additional plotting parameters listed for simulation_model1() also applies.

Model 6

This models generates shape outliers that have a different shape for a portion of the domain. The main model is of the form: \[X_i(t) = \mu t + e_i(t),\] with contamination model of the form: \[X_i(t) = \mu t + (-1)^u\cdot q + (-1)^{(1-u)}\left(\frac{1}{\sqrt{r\pi}}\right)\exp{(-z(t-v)^w)} + e_i(t)\] where:

A call to simulation_model6() generates data from this model:

dtss <- simulation_model6(n = 100, p = 50, outlier_rate = .1,
                          seed = 50, plot = F)

Additional parameters of simulation_model6() to which arguments can be passed are:

Additional plotting parameters listed for simulation_model1() also applies.

Model 7

This model generates pure shape outliers that are periodic. The main model is of the form: \[X_i(t) = \mu t + e_i(t),\] with contamination model of the form: \[X_i(t) = \mu t + k\sin(r\pi(t + \theta)) + e_i(t),\] where:

A call to simulation_model7() generates data from this model:

dtss <- simulation_model7(n = 100, p = 50, outlier_rate = .1,
                          seed = 50, plot = F)

Additional parameters of simulation_model7() to which arguments can be passed are:

Additional plotting parameters listed for simulation_model1() also applies.

Model 8

This model generates pure shape outliers that are periodic. The main model is of the form: \[X_i(t) = k\sin(r\pi t) + e_i(t),\] with contamination model of the form: \[X_i(t) = k\sin(r\pi t + v) + e_i(t),\] where:

A call to simulation_model8() generates data from this model:

dtss <- simulation_model8(n = 100, p = 50, outlier_rate = .1,
                          seed = 50, plot = F)

Additional parameters of simulation_model7() to which arguments can be passed are:

Additional plotting parameters listed for simulation_model1() also applies.

Model 9

Periodic functions with outliers of different amplitude. The main model is of the form: \[X_i(t) = a_{1i}\sin \pi + a_{2i}\cos\pi + e_i(t),\] with contamination model of the form: \[X_i(t) = (b_{1i}\sin\pi + b_{2i}\cos\pi)(1-u_i) + (c_{1i}\sin\pi + c_{2i}\cos\pi)u_i + e_i(t),\] where:

A call to simulation_model9() generates data from this model:

dtss <- simulation_model9(n = 100, p = 50, outlier_rate = .1,
                          seed = 50, plot = F)

Additional parameters of simulation_model9() to which arguments can be passed are:

Additional plotting parameters listed for simulation_model1() also applies.