| Type: | Package | 
| Title: | Scalable Spike-and-Slab | 
| Version: | 1.0 | 
| Date: | 2022-05-13 | 
| Description: | A scalable Gibbs sampling implementation for high dimensional Bayesian regression with the continuous spike-and-slab prior. Niloy Biswas, Lester Mackey and Xiao-Li Meng, "Scalable Spike-and-Slab" (2022) <doi:10.48550/arXiv.2204.01668>. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| Imports: | Rcpp, stats, TruncatedNormal | 
| LinkingTo: | Rcpp, RcppEigen | 
| RoxygenNote: | 7.1.2 | 
| NeedsCompilation: | yes | 
| Packaged: | 2022-05-17 20:40:23 UTC; niloybiswas | 
| Author: | Niloy Biswas | 
| Maintainer: | Niloy Biswas <niloy_biswas@g.harvard.edu> | 
| Depends: | R (≥ 3.5.0) | 
| Repository: | CRAN | 
| Date/Publication: | 2022-05-18 17:00:07 UTC | 
Riboflavin GWAS dataset
Description
Dataset of riboflavin production by Bacillus subtilis containing n = 71 observations of a one-dimensional response (riboflavin production) and p = 4088 predictors (gene expressions). The one-dimensional response corresponds to riboflavin production.
Usage
data(riboflavin)
Format
A data frame containing a vector y of length 71 (responses) and a matrix X of dimension 71 by 4088 (gene expressions)
Details
The processed dataset is the same as in the R packages qut and hdi.
References
Buhlmann, P., Kalisch, M. and Meier, L. (2014) High-dimensional statistics with a view towards applications in biology. Annual Review of Statistics and its Applications 1, 255–278
Examples
data(riboflavin)
y <- as.vector(riboflavin$y)
X <- as.matrix(riboflavin$x)
spike_slab_linear
Description
Generates Markov chain targeting the posterior corresponding to Bayesian linear regression with spike and slab priors
Usage
spike_slab_linear(
  chain_length,
  X,
  y,
  tau0,
  tau1,
  q,
  a0 = 1,
  b0 = 1,
  rinit = NULL,
  verbose = FALSE,
  burnin = 0,
  store = TRUE,
  Xt = NULL,
  XXt = NULL,
  tau0_inverse = NULL,
  tau1_inverse = NULL
)
Arguments
| chain_length | Markov chain length | 
| X | matrix of length n by p | 
| y | Response | 
| tau0 | prior hyperparameter (non-negative real) | 
| tau1 | prior hyperparameter (non-negative real) | 
| q | prior hyperparameter (strictly between 0 and 1) | 
| a0 | prior hyperparameter (non-negative real) | 
| b0 | prior hyperparameter (non-negative real) | 
| rinit | initial distribution of Markov chain (default samples from the prior) | 
| verbose | print iteration of the Markov chain (boolean) | 
| burnin | chain burnin (non-negative integer) | 
| store | store chain trajectory (boolean) | 
| Xt | Pre-calculated transpose of X | 
| XXt | Pre-calculated matrix X*transpose(X) (n by n matrix) | 
| tau0_inverse | Pre-calculated matrix inverse(I + tau0^2*XXt) (n by n matrix) | 
| tau1_inverse | Pre-calculated matrix inverse(I + tau1^2*XXt) (n by n matrix) | 
Value
Output from Markov chain targeting the posterior corresponding to Bayesian linear regression with spike and slab priors
Examples
# Synthetic dataset
syn_data <- synthetic_data(n=100,p=200,s0=5,error_std=2,type='linear')
X <- syn_data$X
y <- syn_data$y
# Hyperparamters
params <- spike_slab_params(n=nrow(X),p=ncol(X))
# Run S^3
sss_chain <- spike_slab_linear(chain_length=4e3,burnin=1e3,X=X,y=y,
tau0=params$tau0,tau1=params$tau1,q=params$q,a0=params$a0,b0=params$b0,
verbose=FALSE,store=FALSE)
# Use posterior probabilities for variable selection
sss_chain$z_ergodic_avg[1:10]
spike_slab_logistic
Description
Generates Markov chain targeting the posterior corresponding to Bayesian logistic regression with spike and slab priors
Usage
spike_slab_logistic(
  chain_length,
  X,
  y,
  tau0,
  tau1,
  q,
  rinit = NULL,
  verbose = FALSE,
  burnin = 0,
  store = TRUE,
  Xt = NULL,
  XXt = NULL
)
Arguments
| chain_length | Markov chain length | 
| X | matrix of length n by p | 
| y | Response | 
| tau0 | prior hyperparameter (non-negative real) | 
| tau1 | prior hyperparameter (non-negative real) | 
| q | prior hyperparameter (strictly between 0 and 1) | 
| rinit | initial distribution of Markov chain (default samples from the prior) | 
| verbose | print iteration of the Markov chain (boolean) | 
| burnin | chain burnin (non-negative integer) | 
| store | store chain trajectory (boolean) | 
| Xt | Pre-calculated transpose of X | 
| XXt | Pre-calculated matrix X*transpose(X) (n by n matrix) | 
Value
Output from Markov chain targeting the posterior corresponding to Bayesian logistic regression with spike and slab priors
Examples
# Synthetic dataset
syn_data <- synthetic_data(n=100,p=200,s0=5,error_std=2,type='logistic')
X <- syn_data$X
y <- syn_data$y
# Hyperparamters
params <- spike_slab_params(n=nrow(X),p=ncol(X))
# Run S^3
sss_chain <- spike_slab_logistic(chain_length=4e3,burnin=1e3,X=X,y=y,
tau0=params$tau0,tau1=params$tau1,q=params$q,verbose=FALSE,store=FALSE)
# Use posterior probabilities for variable selection
sss_chain$z_ergodic_avg[1:10]
spike_slab_params
Description
Generates hyperparameters for spike-and-slab
Usage
spike_slab_params(n, p)
Arguments
| n | number of observations | 
| p | number of covariates | 
Value
spike-and-slab hyperparameters q, tau0, tau1, a0, b0
Examples
hyper_params <- spike_slab_params(n=100,p=200)
print(hyper_params)
spike_slab_probit
Description
Generates Markov chain targeting the posterior corresponding to Bayesian probit regression with spike and slab priors
Usage
spike_slab_probit(
  chain_length,
  X,
  y,
  tau0,
  tau1,
  q,
  rinit = NULL,
  verbose = FALSE,
  burnin = 0,
  store = TRUE,
  Xt = NULL,
  XXt = NULL,
  tau0_inverse = NULL,
  tau1_inverse = NULL
)
Arguments
| chain_length | Markov chain length | 
| X | matrix of length n by p | 
| y | Response | 
| tau0 | prior hyperparameter (non-negative real) | 
| tau1 | prior hyperparameter (non-negative real) | 
| q | prior hyperparameter (strictly between 0 and 1) | 
| rinit | initial distribution of Markov chain (default samples from the prior) | 
| verbose | print iteration of the Markov chain (boolean) | 
| burnin | chain burnin (non-negative integer) | 
| store | store chain trajectory (boolean) | 
| Xt | Pre-calculated transpose of X | 
| XXt | Pre-calculated matrix X*transpose(X) (n by n matrix) | 
| tau0_inverse | Pre-calculated matrix inverse(I + tau0^2*XXt) (n by n matrix) | 
| tau1_inverse | Pre-calculated matrix inverse(I + tau1^2*XXt) (n by n matrix) | 
Value
Output from Markov chain targeting the posterior corresponding to Bayesian logistic regression with spike and slab priors
Examples
# Synthetic dataset
syn_data <- synthetic_data(n=100,p=200,s0=5,error_std=2,type='probit')
X <- syn_data$X
Xt <- t(X)
y <- syn_data$y
# Hyperparamters
params <- spike_slab_params(n=nrow(X),p=ncol(X))
# Run S^3
sss_chain <- spike_slab_probit(chain_length=4e3,burnin=1e3,X=X,y=y,
tau0=params$tau0,tau1=params$tau1,q=params$q,verbose=FALSE,store=FALSE)
# Use posterior probabilities for variable selection
sss_chain$z_ergodic_avg[1:10]
synthetic_data
Description
Generates synthetic linear and logistic regression data
Usage
synthetic_data(
  n,
  p,
  s0,
  error_std,
  type = "linear",
  scale = TRUE,
  signal = "constant"
)
Arguments
| n | number of observations | 
| p | number of covariates | 
| s0 | sparsity (number of non-zero components of the true signal) | 
| error_std | Standard deviation of the Gaussian noise (linear regression only) | 
| type | dataset type ('linear' or 'logistic') | 
| scale | design matrix X has columns mean zero and standard deviation 1 (TRUE or FALSE) | 
| signal | non-zero components of the true signal ('constant' or 'deacy') | 
Value
Design matrix, response and true signal vector for linear and logistic regression
Examples
syn_data <- synthetic_data(n=100,p=200,s0=5,error_std=2)
# syn_data$X is an n by p design matrix
dim(syn_data$X)
# syn_data$y is a length n response vector
length(syn_data$y) 
# syn_data$true_beta is a length n response vector with only the first s0 entries non-zero
all(syn_data$true_beta[1:5]!=0)
all(syn_data$true_beta[-c(1:5)]==0)