| Title: | Bayesian MI-LASSO for Variable Selection on Multiply-Imputed Datasets | 
| Version: | 1.0.3 | 
| Author: | Jungang Zou [aut, cre], Sijian Wang [aut], Qixuan Chen [aut] | 
| Maintainer: | Jungang Zou <jungang.zou@gmail.com> | 
| Description: | Provides a suite of Bayesian MI-LASSO for variable selection methods for multiply-imputed datasets. The package includes four Bayesian MI-LASSO models using shrinkage (Multi-Laplace, Horseshoe, ARD) and Spike-and-Slab (Spike-and-Laplace) priors, along with tools for model fitting via MCMC, four-step projection predictive variable selection, and hyperparameter calibration. Methods are suitable for both continuous and binary covariates under missing-at-random or missing-completely-at-random assumptions. See Zou, J., Wang, S. and Chen, Q. (2025), Bayesian MI-LASSO for Variable Selection on Multiply-Imputed Data. ArXiv, 2211.00114. <doi:10.48550/arXiv.2211.00114> for more details. We also provide the frequentist's MI-LASSO function. | 
| License: | Apache License (≥ 2) | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| Depends: | R (≥ 3.5.0) | 
| Imports: | MCMCpack, mvnfast, GIGrvg, MASS, Rfast, foreach, doParallel, arm, mice, abind, stringr, stats, posterior | 
| Suggests: | testthat, knitr, rmarkdown | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2025-08-25 12:48:22 UTC; jz3183 | 
| Repository: | CRAN | 
| Date/Publication: | 2025-08-25 13:20:13 UTC | 
ARD MCMC Sampler for Multiply-Imputed Regression
Description
Implements Bayesian variable selection using the Automatic Relevance Determination (ARD) prior across multiply-imputed datasets. The ARD prior imposes feature-specific shrinkage by placing a prior proportional to inverse of precision of each coefficient.
Usage
ARD_mcmc(
  X,
  Y,
  intercept = TRUE,
  nburn = 4000,
  npost = 4000,
  seed = NULL,
  verbose = TRUE,
  printevery = 1000,
  chain_index = 1
)
Arguments
| X | A 3-D array of predictors with dimensions  | 
| Y | A matrix of outcomes with dimensions  | 
| intercept | Logical; include an intercept? Default  | 
| nburn | Integer; number of burn-in MCMC iterations. Default  | 
| npost | Integer; number of post-burn-in samples to retain. Default  | 
| seed | Integer or  | 
| verbose | Logical; print progress messages? Default  | 
| printevery | Integer; print progress every this many iterations. Default  | 
| chain_index | Integer; index of this MCMC chain (for labeling messages). Default  | 
Value
A named list with components:
- post_beta
- Array - npost × D × pof sampled regression coefficients.
- post_alpha
- Matrix - npost × Dof sampled intercepts (if used).
- post_sigma2
- Numeric vector length - npost, sampled residual variances.
- post_psi2
- Matrix - npost × pof sampled precision parameters for each coefficient.
- post_fitted_Y
- Array - npost × D × nof posterior predictive draws (with noise).
- post_pool_beta
- Matrix - (npost * D) × pof pooled coefficient draws.
- post_pool_fitted_Y
- Matrix - (npost * D) × nof pooled predictive draws (with noise).
- hat_matrix_proj
- Matrix - D × n × nof averaged projection hat-matrices. To avoid recalculate for estimating degree of freedom.
Examples
sim <- sim_B(n = 100, p = 20, type = "MAR", SNP = 1.5, corr = 0.5,
low_missing = TRUE, n_imp = 5, seed = 123)
X <- sim$data_MI$X
Y <- sim$data_MI$Y
fit <- ARD_mcmc(X, Y, nburn = 100, npost = 100)
Bayesian MI-LASSO for Multiply-Imputed Regression
Description
Fit a Bayesian multiple-imputation LASSO (BMI-LASSO) model across multiply-imputed datasets, using one of four priors: Multi-Laplace, Horseshoe, ARD, or Spike-Laplace. Automatically standardizes data, runs MCMC in parallel, performs variable selection via four-step projection predictive variable selection, and selects a final submodel by BIC.
Usage
BMI_LASSO(
  X,
  Y,
  model,
  standardize = TRUE,
  SNC = TRUE,
  grid = seq(0, 1, 0.01),
  orthogonal = FALSE,
  nburn = 4000,
  npost = 4000,
  seed = NULL,
  nchains = 1,
  ncores = 1,
  output_verbose = TRUE,
  printevery = 1000,
  ...
)
Arguments
| X | A numeric matrix or array of predictors.  If a matrix  | 
| Y | A numeric vector or matrix of outcomes.  If a vector of length  | 
| model | Character; which prior to use.  One of  | 
| standardize | Logical; whether to normalize each  | 
| SNC | Logical; if  | 
| grid | Numeric vector; grid of scaled neighborhood criterion (or thresholding) to explore.
Default  | 
| orthogonal | Logical; if  | 
| nburn | Integer; number of burn-in MCMC iterations per chain. Default  | 
| npost | Integer; number of post-burn-in samples to retain per chain. Default  | 
| seed | Optional integer; base random seed. Each chain adds its index. | 
| nchains | Integer; number of MCMC chains to run in parallel. Default  | 
| ncores | Integer; number of parallel cores to use. Default  | 
| output_verbose | Logical; print progress messages. Default  | 
| printevery | Integer; print status every so many iterations. Default  | 
| ... | Additional model-specific hyperparameters: 
 | 
Value
A named list with elements:
- posterior
- List of length - nchainsof MCMC outputs (posterior draws).
- select
- List of length - nchainsof logical matrices showing which variables are selected at each grid value.
- best_select
- List of length - nchainsof the single best selection (by BIC) for each chain.
- posterior_best_models
- List of length - nchainsof projected posterior draws for the best submodel.
- bic_models
- List of length - nchainsof BIC values and degrees-of-freedom for each candidate submodel.
- summary_table_full
- A data frame summarizing rank-normalized split-Rhat and other diagnostics for the full model. 
- summary_table_selected
- A data frame summarizing diagnostics for the selected submodel after projection. 
Examples
sim <- sim_A(n = 100, p = 20, type = "MAR", SNP = 1.5, low_missing = TRUE, n_imp = 5, seed = 123)
X <- sim$data_MI$X
Y <- sim$data_MI$Y
fit <- BMI_LASSO(X, Y, model = "Horseshoe",
                 nburn = 100, npost = 100,
                 nchains = 1, ncores = 1)
str(fit$best_select)
Multiple-Imputation LASSO (MI-LASSO)
Description
Fit a LASSO-like penalty across D multiply-imputed datasets by
iteratively reweighted ridge regressions (Equation (4) of the manuscript).
For each tuning parameter in lamvec, it returns the pooled
coefficient estimates, the BIC, and the selected variables.
Usage
MI_LASSO(
  X,
  Y,
  lamvec = (2^(seq(-1, 4, by = 0.05)))^2/2,
  maxiter = 200,
  eps = 1e-20,
  ncores = 1
)
Arguments
| X | A matrix  | 
| Y | A vector length  | 
| lamvec | Numeric vector of penalty parameters  | 
| maxiter | Integer; maximum number of ridge–update iterations per  | 
| eps | Numeric; convergence tolerance on coefficient change. Default  | 
| ncores | Integer; number of cores for parallelizing over  | 
Value
If length(lamvec) > 1, a list with elements:
- best
- List for the - lambdawith minimal BIC containing:- coefficients(- (p+1)×Dintercept + slopes),- bic(BIC scalar),- varsel(logical length-- pvector of selected predictors),- lambda(the chosen penalty).
- lambda_path
- length(lamvec)×2matrix of each- lambdaand its corresponding BIC.
If length(lamvec) == 1, returns a single list (as above) for that
penalty.
Examples
sim <- sim_A(n = 100, p = 20, type = "MAR", SNP = 1.5, low_missing = TRUE, n_imp = 5, seed = 123)
X <- sim$data_MI$X
Y <- sim$data_MI$Y
fit <- MI_LASSO(X, Y, lamvec = c(0.1))
Horseshoe MCMC Sampler for Multiply-Imputed Regression
Description
Implements Bayesian variable selection using the hierarchical Horseshoe prior
across multiply-imputed datasets.  This model applies global–local shrinkage
to regression coefficients via a global scale (tau2), local scales
(lambda2), and auxiliary hyperpriors (kappa, eta).
Usage
horseshoe_mcmc(
  X,
  Y,
  intercept = TRUE,
  nburn = 4000,
  npost = 4000,
  seed = NULL,
  verbose = TRUE,
  printevery = 1000,
  chain_index = 1
)
Arguments
| X | A 3-D array of predictors with dimensions  | 
| Y | A matrix of outcomes with dimensions  | 
| intercept | Logical; include an intercept term? Default  | 
| nburn | Integer; number of burn-in MCMC iterations. Default  | 
| npost | Integer; number of post-burn-in samples to retain. Default  | 
| seed | Integer or  | 
| verbose | Logical; print progress messages? Default  | 
| printevery | Integer; print progress every this many iterations. Default  | 
| chain_index | Integer; index of this MCMC chain (for labeling prints). Default  | 
Value
A named list with components:
- post_beta
- Array - npost × D × pof sampled regression coefficients.
- post_alpha
- Matrix - npost × Dof sampled intercepts (if used).
- post_sigma2
- Numeric vector of length - npost, sampled residual variances.
- post_lambda2
- Matrix - npost × pof local shrinkage parameters- \lambda_j^2.
- post_kappa
- Matrix - npost × pof auxiliary local hyperparameters- \kappa_j.
- post_tau2
- Numeric vector of length - npost, sampled global scale- \tau^2.
- post_eta
- Numeric vector of length - npost, sampled auxiliary global hyperparameter- \eta.
- post_fitted_Y
- Array - npost × D × nof posterior predictive draws (with noise).
- post_pool_beta
- Matrix - (npost * D) × pof pooled coefficient draws.
- post_pool_fitted_Y
- Matrix - (npost * D) × nof pooled predictive draws (with noise).
- hat_matrix_proj
- Matrix - D × n × nof averaged projection hat-matrices. To avoid recalculate for estimating degree of freedom.
Examples
sim <- sim_B(n = 100, p = 20, type = "MAR", SNP = 1.5, corr = 0.5,
low_missing = TRUE, n_imp = 5, seed = 123)
X <- sim$data_MI$X
Y <- sim$data_MI$Y
fit <- horseshoe_mcmc(X, Y, nburn = 100, npost = 100)
Multi-Laplace MCMC Sampler for Multiply-Imputed Regression
Description
Implements Bayesian variable selection under the Multi-Laplace prior on
regression coefficients across multiply-imputed datasets.  The prior shares
local shrinkage parameters (lambda2) across imputations and places
a Gamma(h, v) hyperprior on the global parameter rho.
Usage
multi_laplace_mcmc(
  X,
  Y,
  intercept = TRUE,
  h = 2,
  v = NULL,
  nburn = 4000,
  npost = 4000,
  seed = NULL,
  verbose = TRUE,
  printevery = 1000,
  chain_index = 1
)
Arguments
| X | A 3-D array of predictors with dimensions  | 
| Y | A matrix of outcomes with dimensions  | 
| intercept | Logical; include an intercept? Default  | 
| h | Numeric; shape parameter of the Gamma prior on  | 
| v | Numeric or  | 
| nburn | Integer; number of burn-in iterations. Default  | 
| npost | Integer; number of post-burn-in samples to store. Default  | 
| seed | Integer or  | 
| verbose | Logical; print progress messages? Default  | 
| printevery | Integer; print progress every this many iterations. Default  | 
| chain_index | Integer; index of this MCMC chain (for messages). Default  | 
Value
A named list with elements:
- post_beta
- Array - npost × D × pof sampled regression coefficients.
- post_alpha
- Matrix - npost × Dof sampled intercepts (if used).
- post_sigma2
- Numeric vector of length - npost, sampled residual variances.
- post_lambda2
- Matrix - npost × pof sampled local shrinkage parameters.
- post_rho
- Numeric vector of length - npost, sampled global parameters.
- post_fitted_Y
- Array - npost × D × nof posterior predictive draws (with noise).
- post_pool_beta
- Matrix - (npost * D) × pof pooled coefficient draws.
- post_pool_fitted_Y
- Matrix - (npost * D) × nof pooled predictive draws (with noise).
- hat_matrix_proj
- Matrix - D × n × nof averaged projection hat-matrices. To avoid recalculate for estimating degree of freedom.
- h,- v
- Numeric; the shape and scale hyperparameters used. 
Examples
sim <- sim_B(n = 100, p = 20, type = "MAR", SNP = 1.5, corr = 0.5, low_missing = TRUE,
n_imp = 5, seed = 123)
X <- sim$data_MI$X
Y <- sim$data_MI$Y
fit <- multi_laplace_mcmc(X, Y, intercept = TRUE, nburn = 100, npost = 100)
Projecting Posterior Means of Full-Model Coefficients onto a Reduced Subset Model
Description
Given posterior means of beta1_mat (and optional intercepts
alpha1_vec) from a full model fitted on D imputed
datasets, compute the predictive projection onto the submodel defined by
xs_vec.  Returns the projected coefficients (and intercepts, if requested).
Usage
projection_mean(X_arr, beta1_mat, xs_vec, sigma2, alpha1_vec = NULL)
Arguments
| X_arr | A 3-D array of predictors, of dimension  | 
| beta1_mat | A  | 
| xs_vec | Logical vector of length  | 
| sigma2 | Numeric scalar; the residual variance from the full model (pooled across imputations). | 
| alpha1_vec | Optional numeric vector of length  | 
Value
A list with components:
- beta2_mat
- A - D * pmatrix of projected submodel coefficients.
- alpha2_vec
- (If - alpha1_vecprovided) numeric vector length- Dof projected intercepts.
Examples
# Simulate a single imputation with n=50, p=5:
D <- 3; n <- 50; p <- 5
X_arr <- array(rnorm(D * n * p), c(D, n, p))
beta1_mat <- matrix(rnorm(D * p), nrow = D)
# Suppose full-model sigma2 pooled is 1.2
sigma2 <- 1.2
# Project onto predictors 1 and 4 only:
xs_vec <- c(TRUE, FALSE, FALSE, TRUE, FALSE)
proj <- projection_mean(X_arr, beta1_mat, xs_vec, sigma2)
str(proj)
# With intercept:
alpha1_vec <- rnorm(D)
proj2 <- projection_mean(X_arr, beta1_mat, xs_vec, sigma2, alpha1_vec)
str(proj2)
Projection of Full-Posterior Draws onto a Reduced-Subset Model
Description
Given posterior draws beta1_arr (and optional intercepts alpha1_arr)
from a full model fitted on D imputed datasets, compute
the predictive projection of each draw onto the submodel defined by xs_vec.
Returns the projected coefficients (and intercepts, if requested) plus the projected
residual variance for each posterior draw.
Usage
projection_posterior(X_arr, beta1_arr, sigma1_vec, xs_vec, alpha1_arr = NULL)
Arguments
| X_arr | A 3-D array of predictors, of dimension  | 
| beta1_arr | A  | 
| sigma1_vec | Numeric vector of length  | 
| xs_vec | Logical vector of length  | 
| alpha1_arr | Optional  | 
Value
A list with components:
- beta2_arr
- Array - npost * D * pof projected submodel coefficients.
- alpha2_arr
- (If - alpha1_arrprovided) matrix- npost * Dof projected intercepts.
- sigma2_opt
- Numeric vector length - npostof projected residual variances.
Examples
D <- 3; n <- 50; p <- 5; npost <- 100
X_arr      <- array(rnorm(D*n*p), c(D, n, p))
beta1_arr  <- array(rnorm(npost*D*p), c(npost, D, p))
sigma1_vec <- runif(npost, 0.5, 2)
xs_vec     <- c(TRUE, FALSE, TRUE, FALSE, TRUE)
# Without intercept
proj <- projection_posterior(X_arr, beta1_arr, sigma1_vec, xs_vec)
str(proj)
# With intercept draws
alpha1_arr <- matrix(rnorm(npost*D), nrow = npost, ncol = D)
proj2 <- projection_posterior(X_arr, beta1_arr, sigma1_vec, xs_vec, alpha1_arr)
str(proj2)
Simulate dataset A: Independent continuous covariates with MCAR/MAR missingness
Description
Generates a dataset for Scenario A used in Bayesian MI-LASSO benchmarking. Covariates are iid standard normal, with a fixed true coefficient vector, linear outcome, missingness imposed on specified columns under MCAR or MAR, and multiple imputations via predictive mean matching.
Usage
sim_A(
  n = 100,
  p = 20,
  type = "MAR",
  SNP = 1.5,
  low_missing = TRUE,
  n_imp = 5,
  seed = NULL
)
Arguments
| n | Integer. Number of observations. | 
| p | Integer. Number of covariates (columns). Takes values in {20, 40}. | 
| type | Character. Missingness mechanism: "MCAR" or "MAR". | 
| SNP | Numeric. Signal-to-noise ratio controlling error variance. | 
| low_missing | Logical. If TRUE, use low missingness rates; if FALSE, higher missingness. | 
| n_imp | Integer. Number of multiple imputations to generate. | 
| seed | Integer or NULL. Random seed for reproducibility. | 
Value
A list with components:
- data_O
- A list of complete covariate matrix and outcomes before missingness. 
- data_mis
- A list of covariate matrix and outcomes with missing values. 
- data_MI
- A list of array of imputed covariates (n_imp × n × p) and a matrix of imputed outcomes (n_imp × n). 
- data_CC
- A list of complete-case covariate matrix and outcomes. 
- important
- Logical vector of true nonzero coefficient indices. 
- covmat
- True covariance matrix used for X. 
- beta
- True coefficient vector. 
Examples
sim <- sim_A(n = 100, p = 20, type = "MAR", SNP = 1.5,
             low_missing = TRUE, n_imp = 5, seed = 123)
str(sim)
Simulate dataset B: AR(1)-correlated continuous covariates with MCAR/MAR missingness
Description
Generates a dataset for Scenario B used in Bayesian MI-LASSO benchmarking. Covariates are multivariate normal with AR(1) covariance, with a fixed true coefficient vector, linear outcome, missingness imposed on specified columns under MCAR or MAR, and multiple imputations via predictive mean matching.
Usage
sim_B(
  n = 100,
  p = 20,
  low_missing = TRUE,
  type = "MAR",
  SNP = 1.5,
  corr = 0.5,
  n_imp = 5,
  seed = NULL
)
Arguments
| n | Integer. Number of observations. | 
| p | Integer. Number of covariates (columns). Takes values in {20, 40}. | 
| low_missing | Logical. If TRUE, use low missingness rates; if FALSE, higher missingness. | 
| type | Character. Missingness mechanism: "MCAR" or "MAR". | 
| SNP | Numeric. Signal-to-noise ratio controlling error variance. | 
| corr | Numeric. AR(1) correlation parameter | 
| n_imp | Integer. Number of multiple imputations to generate. | 
| seed | Integer or NULL. Random seed for reproducibility. | 
Value
A list with components:
- data_O
- A list of complete covariate matrix and outcomes before missingness. 
- data_mis
- A list of covariate matrix and outcomes with missing values. 
- data_MI
- A list of array of imputed covariates (n_imp × n × p) and a matrix of imputed outcomes (n_imp × n). 
- data_CC
- A list of complete-case covariate matrix and outcomes. 
- important
- Logical vector of true nonzero coefficient indices. 
- covmat
- True covariance matrix used for X. 
- beta
- True coefficient vector. 
Examples
sim <- sim_B(n = 100, p = 20, type = "MAR", SNP = 1.5, corr = 0.5,
             low_missing = TRUE, n_imp = 5, seed = 123)
str(sim)
Simulate dataset C: AR(1)-latent Gaussian dichotomized to binary covariates with MCAR/MAR missingness
Description
Generates binary covariates by thresholding an AR(1) latent Gaussian, then proceeds as in sim_B.
Usage
sim_C(
  n = 100,
  p = 20,
  low_missing = TRUE,
  type = "MAR",
  SNP = 1.5,
  corr = 0.5,
  n_imp = 5,
  seed = NULL
)
Arguments
| n | Integer. Number of observations. | 
| p | Integer. Number of covariates (columns). Takes values in {20, 40}. | 
| low_missing | Logical. If TRUE, use low missingness rates; if FALSE, higher missingness. | 
| type | Character. Missingness mechanism: "MCAR" or "MAR". | 
| SNP | Numeric. Signal-to-noise ratio controlling error variance. | 
| corr | Numeric. AR(1) correlation parameter | 
| n_imp | Integer. Number of multiple imputations to generate. | 
| seed | Integer or NULL. Random seed for reproducibility. | 
Value
A list with components:
- data_O
- A list of complete covariate matrix and outcomes before missingness. 
- data_mis
- A list of covariate matrix and outcomes with missing values. 
- data_MI
- A list of array of imputed covariates (n_imp × n × p) and a matrix of imputed outcomes (n_imp × n). 
- data_CC
- A list of complete-case covariate matrix and outcomes. 
- important
- Logical vector of true nonzero coefficient indices. 
- covmat
- True covariance matrix used for X. 
- beta
- True coefficient vector. 
Examples
sim <- sim_C(n = 100, p = 20, type = "MAR", SNP = 1.5, corr = 0.5,
             low_missing = TRUE, n_imp = 5, seed = 123)
str(sim)
Spike-and-Laplace MCMC Sampler for Multiply-Imputed Regression
Description
Implements Bayesian variable selection using a spike-and-slab prior with a Laplace (double-exponential) slab
on nonzero coefficients. Latent inclusion indicators gamma follow Bernoulli(theta), and their probabilities
follow independent Beta(a, b) priors.
Usage
spike_laplace_partially_mcmc(
  X,
  Y,
  intercept = TRUE,
  a = 2,
  b = NULL,
  nburn = 4000,
  npost = 4000,
  seed = NULL,
  verbose = TRUE,
  printevery = 1000,
  chain_index = 1
)
Arguments
| X | A 3-D array of predictors with dimensions  | 
| Y | A matrix of outcomes with dimensions  | 
| intercept | Logical; include an intercept term? Default  | 
| a | Numeric; shape parameter of the Gamma prior. Default  | 
| b | Numeric or  | 
| nburn | Integer; number of burn-in MCMC iterations. Default  | 
| npost | Integer; number of post-burn-in samples to retain. Default  | 
| seed | Integer or  | 
| verbose | Logical; print progress messages? Default  | 
| printevery | Integer; print progress every this many iterations. Default  | 
| chain_index | Integer; index of this MCMC chain (for labeling messages). Default  | 
Value
A named list with components:
- post_rho
- Numeric vector length - npost, sampled global scale- \rho.
- post_gamma
- Matrix - npost * pof sampled inclusion indicators.
- post_theta
- Matrix - npost * pof sampled Beta parameters- \theta_j.
- post_alpha
- Matrix - npost * Dof sampled intercepts (if used).
- post_lambda2
- Matrix - npost * pof sampled local scale parameters- \lambda_j^2.
- post_sigma2
- Numeric vector length - npost, sampled residual variances.
- post_beta
- Array - npost * D * pof sampled regression coefficients.
- post_fitted_Y
- Array - npost * D * nof posterior predictive draws (including noise).
- post_pool_beta
- Matrix - (npost * D) * pof pooled coefficient draws.
- post_pool_fitted_Y
- Matrix - (npost * D) * nof pooled predictive draws (with noise).
- hat_matrix_proj
- Matrix - D * n * nof averaged projection hat-matrices. To avoid recalculate for estimating degree of freedom.
- a,- b
- Numeric values of the rho hyperparameters used. 
Examples
sim <- sim_B(n = 100, p = 20, type = "MAR", SNP = 1.5, corr = 0.5,
low_missing = TRUE, n_imp = 5, seed = 123)
X <- sim$data_MI$X
Y <- sim$data_MI$Y
fit <- spike_laplace_partially_mcmc(X, Y, nburn = 10, npost = 10)