Type: | Package |
Title: | LIC for Distributed Elliptical Model |
Version: | 0.1.0 |
Date: | 2025-08-16 |
Description: | This comprehensive toolkit for Distributed Elliptical model is designated as "ELIC" (The LIC for Distributed Elliptical Model Analysis) analysis. It is predicated on the assumption that the error term adheres to a Elliptical distribution. The philosophy of the package is described in Guo G. (2020) <doi:10.1080/02664763.2022.2053949>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Author: | Guangbao Guo [aut, cre], Xiyu Zhao [aut] |
Maintainer: | Guangbao Guo <ggb11111111@163.com> |
Repository: | CRAN |
Config/testthat/edition: | 3 |
Imports: | distr, distrEllipse,MASS |
Suggests: | testthat (≥ 3.0.0), sn |
Depends: | R (≥ 4.4.0) |
Packaged: | 2025-08-27 01:37:26 UTC; 13269 |
Date/Publication: | 2025-09-04 14:20:33 UTC |
A General Length and Information Criterion (LIC) Function
Description
This function applies the LIC method to find an optimal data subset, supporting various error term distributions like T-distribution and skewed distributions.
Usage
ELIC(X, Y, alpha = 0.05, K = 10, nk = NULL, dist_type = "student_t")
Arguments
X |
A numeric design matrix. |
Y |
A numeric response vector. |
alpha |
The significance level for criterion calculation, default is 0.05. |
K |
The number of subsets to sample, default is 10. |
nk |
The sample size of each subset. If NULL (default), it's calculated as n/K. |
dist_type |
A character string specifying the assumed error distribution. Accepts T-distribution types (e.g., "student_t") from the original TLIC, and skewed types ("skew_normal", "skew_t", "skew_laplace") from SLIC. Note: In this implementation, the core calculation is robust and does not change based on dist_type. The parameter is kept for consistency with the original functions. |
Details
The function iteratively samples subsets from the data, calculates a length criterion (L1) and an information criterion (N), and finds an optimal subset based on the intersection of the best samples from both criteria. It is a general implementation combining the logic of TLIC and SLIC.
Value
A list containing the optimal model components:
MUopt |
The predicted values for the optimal subset. |
Bopt |
The estimated coefficients for the optimal model. |
MAEMUopt |
The Mean Absolute Error of the optimal model. |
MSEMUopt |
The Mean Squared Error of the optimal model. |
opt |
The indices of the optimal data subset. |
Yopt |
The response values of the optimal subset. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Guo, G., Sun, Y., Qian, G., & Wang, Q. (2022). LIC criterion for optimal subset selection in distributed interval estimation. Journal of Applied Statistics, 50(9), 1900-1920. doi:10.1080/02664763.2022.2053949.
Chang, D., Guo, G. (2024). LIC: An R package for optimal subset selection for distributed data. SoftwareX, 28, 101909.
Jing, G., & Guo, G. (2025). TLIC: An R package for the LIC for T distribution regression analysis. SoftwareX, 30, 102132.
Chang, D., & Guo, G. Research on Distributed Redundant Data Estimation Based on LIC. IAENG International Journal of Applied Mathematics, 55(1), 1-6 (2025).
Gao, H., & Guo, G. LIC for Distributed Skewed Regression. IAENG International Journal of Applied Mathematics, 55(9), 2925-2930 (2025).
Zhang, C., & Guo, G. (2025). The optimal subset estimation of distributed redundant data. IAENG International Journal of Applied Mathematics, 55(2), 270-277.
Jing, G., & Guo, G. (2025). Student LIC for distributed estimation. IAENG International Journal of Applied Mathematics, 55(3), 575-581.
Liu, Q., & Guo, G. (2025). Distributed estimation of redundant data. IAENG International Journal of Applied Mathematics, 55(2), 332-337.
Examples
# Example with T-distributed error data (like TLIC)
set.seed(12)
n <- 200
p <- 5
X_t <- matrix(stats::runif(n * p), ncol = p)
beta_t <- sort(stats::runif(p, 1, 5))
e_t <- stats::rt(n, df = 5)
Y_t <- X_t %*% beta_t + e_t
result_t <- ELIC(X_t, Y_t, dist_type = "student_t")
str(result_t)
# Example with Skew-Normal error data (like SLIC)
if (requireNamespace("sn", quietly = TRUE)) {
set.seed(123)
n <- 200
p <- 5
X_s <- matrix(stats::rnorm(n * p), ncol = p)
beta_s <- stats::runif(p, 1, 2)
e_s <- sn::rsn(n = n, xi = 0, omega = 1, alpha = 5)
Y_s <- X_s %*% beta_s + e_s
result_s <- ELIC(X_s, Y_s, K = 5, dist_type = "skew_normal")
str(result_s)
}
Calculate the LIC estimator based on A-optimal and D-optimal criterion
Description
Calculate the LIC estimator based on A-optimal and D-optimal criterion
Usage
LICnew(X, Y, alpha, K, nk)
Arguments
X |
A matrix of observations (design matrix) with size n x p |
Y |
A vector of responses with length n |
alpha |
The significance level for confidence intervals |
K |
The number of subsets to consider |
nk |
The size of each subset |
Value
A list containing:
E5 |
The LIC estimator based on A-optimal and D-optimal criterion. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Guo, G., Sun, Y., Qian, G., & Wang, Q. (2022). LIC criterion for optimal subset selection in distributed interval estimation. Journal of Applied Statistics, 50(9), 1900-1920. doi:10.1080/02664763.2022.2053949.
Chang, D., Guo, G. (2024). LIC: An R package for optimal subset selection for distributed data. SoftwareX, 28, 101909.
Jing, G., & Guo, G. (2025). TLIC: An R package for the LIC for T distribution regression analysis. SoftwareX, 30, 102132.
Chang, D., & Guo, G. Research on Distributed Redundant Data Estimation Based on LIC. IAENG International Journal of Applied Mathematics, 55(1), 1-6 (2025).
Gao, H., & Guo, G. LIC for Distributed Skewed Regression. IAENG International Journal of Applied Mathematics, 55(9), 2925-2930 (2025).
Zhang, C., & Guo, G. (2025). The optimal subset estimation of distributed redundant data. IAENG International Journal of Applied Mathematics, 55(2), 270-277.
Jing, G., & Guo, G. (2025). Student LIC for distributed estimation. IAENG International Journal of Applied Mathematics, 55(3), 575-581.
Liu, Q., & Guo, G. (2025). Distributed estimation of redundant data. IAENG International Journal of Applied Mathematics, 55(2), 332-337.
Examples
p = 6; n = 1000; K = 2; nk = 200; alpha = 0.05; sigma = 1
e = rnorm(n, 0, sigma); beta = c(sort(c(runif(p, 0, 1))));
data = c(rnorm(n * p, 5, 10)); X = matrix(data, ncol = p);
Y = X %*% beta + e;
LICnew(X = X, Y = Y, alpha = alpha, K = K, nk = nk)
Caculate the estimators of beta on the A-opt and D-opt
Description
Caculate the estimators of beta on the A-opt and D-opt
Usage
beta_AD(K = K, nk = nk, alpha = alpha, X = X, y = y)
Arguments
K |
is the number of subsets |
nk |
is the length of subsets |
alpha |
is the significance level |
X |
is the observation matrix |
y |
is the response vector |
Value
A list containing:
betaA |
The estimator of beta on the A-opt. |
betaD |
The estimator of beta on the D-opt. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Guo, G., Sun, Y., Qian, G., & Wang, Q. (2022). LIC criterion for optimal subset selection in distributed interval estimation. Journal of Applied Statistics, 50(9), 1900-1920. doi:10.1080/02664763.2022.2053949.
Chang, D., Guo, G. (2024). LIC: An R package for optimal subset selection for distributed data. SoftwareX, 28, 101909.
Jing, G., & Guo, G. (2025). TLIC: An R package for the LIC for T distribution regression analysis. SoftwareX, 30, 102132.
Chang, D., & Guo, G. Research on Distributed Redundant Data Estimation Based on LIC. IAENG International Journal of Applied Mathematics, 55(1), 1-6 (2025).
Gao, H., & Guo, G. LIC for Distributed Skewed Regression. IAENG International Journal of Applied Mathematics, 55(9), 2925-2930 (2025).
Zhang, C., & Guo, G. (2025). The optimal subset estimation of distributed redundant data. IAENG International Journal of Applied Mathematics, 55(2), 270-277.
Jing, G., & Guo, G. (2025). Student LIC for distributed estimation. IAENG International Journal of Applied Mathematics, 55(3), 575-581.
Liu, Q., & Guo, G. (2025). Distributed estimation of redundant data. IAENG International Journal of Applied Mathematics, 55(2), 332-337.
Examples
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1
e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1))));
data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p);
y=X%*%beta+e;
beta_AD(K=K,nk=nk,alpha=alpha,X=X,y=y)
Caculate the estimator of beta on the COR
Description
Caculate the estimator of beta on the COR
Usage
beta_cor(K = K, nk = nk, alpha = alpha, X = X, y = y)
Arguments
K |
is the number of subsets |
nk |
is the length of subsets |
alpha |
is the significance level |
X |
is the observation matrix |
y |
is the response vector |
Value
A list containing:
betaC |
The estimator of beta on the COR. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Guo, G., Sun, Y., Qian, G., & Wang, Q. (2022). LIC criterion for optimal subset selection in distributed interval estimation. Journal of Applied Statistics, 50(9), 1900-1920. doi:10.1080/02664763.2022.2053949.
Chang, D., Guo, G. (2024). LIC: An R package for optimal subset selection for distributed data. SoftwareX, 28, 101909.
Jing, G., & Guo, G. (2025). TLIC: An R package for the LIC for T distribution regression analysis. SoftwareX, 30, 102132.
Chang, D., & Guo, G. Research on Distributed Redundant Data Estimation Based on LIC. IAENG International Journal of Applied Mathematics, 55(1), 1-6 (2025).
Gao, H., & Guo, G. LIC for Distributed Skewed Regression. IAENG International Journal of Applied Mathematics, 55(9), 2925-2930 (2025).
Zhang, C., & Guo, G. (2025). The optimal subset estimation of distributed redundant data. IAENG International Journal of Applied Mathematics, 55(2), 270-277.
Jing, G., & Guo, G. (2025). Student LIC for distributed estimation. IAENG International Journal of Applied Mathematics, 55(3), 575-581.
Liu, Q., & Guo, G. (2025). Distributed estimation of redundant data. IAENG International Journal of Applied Mathematics, 55(2), 332-337.
Examples
p=6;n=1000;K=2;nk=200;alpha=0.05;sigma=1
e=rnorm(n,0,sigma); beta=c(sort(c(runif(p,0,1))));
data=c(rnorm(n*p,5,10));X=matrix(data, ncol=p);
y=X%*%beta+e;
beta_cor(K=K,nk=nk,alpha=alpha,X=X,y=y)
Generate Data with Elliptically Distributed Covariates
Description
This function generates a dataset for a linear model where the covariate matrix X follows an elliptical distribution.
Usage
eerr(n, p, dist_type)
Arguments
n |
The number of observations (rows) to generate. |
p |
The number of predictors/dimensions (columns) for the covariate matrix X. |
dist_type |
A character string specifying the type of elliptical distribution for X. Must be one of "Elliptical-Normal", "Elliptical-t", or "Elliptical-cov". |
Details
The function generates a response vector Y based on the linear model Y = X The covariate matrix X is generated from one of three types of elliptical distributions: 1. 'Elliptical-Normal': Based on a multivariate normal distribution structure. 2. 'Elliptical-t': Based on a multivariate t-distribution structure. 3. 'Elliptical-cov': Based on a custom covariance matrix adjusted via its eigenvalues. The error term 'e' is drawn from a standard normal distribution.
Value
A list containing the following components:
X |
An n x p matrix of covariates from the specified elliptical distribution. |
Y |
A numeric vector of n responses. |
e |
A numeric vector of n error terms from a standard normal distribution. |
References
Guo, G., Song, H. & Zhu, L. The COR criterion for optimal subset selection in distributed estimation. Statistics and Computing, 34, 163 (2024). doi:10.1007/s11222-024-10471-z
Guo, G., Sun, Y., Qian, G., & Wang, Q. (2022). LIC criterion for optimal subset selection in distributed interval estimation. Journal of Applied Statistics, 50(9), 1900-1920. doi:10.1080/02664763.2022.2053949.
Chang, D., Guo, G. (2024). LIC: An R package for optimal subset selection for distributed data. SoftwareX, 28, 101909.
Jing, G., & Guo, G. (2025). TLIC: An R package for the LIC for T distribution regression analysis. SoftwareX, 30, 102132.
Chang, D., & Guo, G. Research on Distributed Redundant Data Estimation Based on LIC. IAENG International Journal of Applied Mathematics, 55(1), 1-6 (2025).
Gao, H., & Guo, G. LIC for Distributed Skewed Regression. IAENG International Journal of Applied Mathematics, 55(9), 2925-2930 (2025).
Zhang, C., & Guo, G. (2025). The optimal subset estimation of distributed redundant data. IAENG International Journal of Applied Mathematics, 55(2), 270-277.
Jing, G., & Guo, G. (2025). Student LIC for distributed estimation. IAENG International Journal of Applied Mathematics, 55(3), 575-581.
Liu, Q., & Guo, G. (2025). Distributed estimation of redundant data. IAENG International Journal of Applied Mathematics, 55(2), 332-337.
Examples
# Generate 100 observations with 5 predictors from an Elliptical-Normal distribution
data_normal <- eerr(n = 100, p = 5, dist_type = "Elliptical-Normal")
str(data_normal)
# Generate 100 observations with 3 predictors from an Elliptical-cov distribution
data_cov <- eerr(n = 100, p = 3, dist_type = "Elliptical-cov")
pairs(data_cov$X) # Visualize the relationships between covariates