| Type: | Package | 
| Title: | Spatial Bayesian Factor Analysis | 
| Version: | 1.4.0 | 
| Date: | 2025-09-30 | 
| Description: | Implements a spatial Bayesian non-parametric factor analysis model with inference in a Bayesian setting using Markov chain Monte Carlo (MCMC). Spatial correlation is introduced in the columns of the factor loadings matrix using a Bayesian non-parametric prior, the probit stick-breaking process. Areal spatial data is modeled using a conditional autoregressive (CAR) prior and point-referenced spatial data is treated using a Gaussian process. The response variable can be modeled as Gaussian, probit, Tobit, or Binomial (using Polya-Gamma augmentation). Temporal correlation is introduced for the latent factors through a hierarchical structure and can be specified as exponential or first-order autoregressive. Full details of the package can be found in the accompanying vignette. Furthermore, the details of the package can be found in "Bayesian Non-Parametric Factor Analysis for Longitudinal Spatial Surfaces", by Berchuck et al (2019), <doi:10.1214/20-BA1253> in Bayesian Analysis. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| LazyDataCompression: | xz | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | yes | 
| Depends: | R (≥ 3.0.2) | 
| Imports: | graphics, grDevices, msm (≥ 1.0.0), mvtnorm (≥ 1.0-0), pgdraw (≥ 1.0), Rcpp (≥ 0.12.9), stats, utils | 
| Suggests: | coda, classInt, knitr, rmarkdown, womblR (≥ 1.0.3) | 
| LinkingTo: | Rcpp, RcppArmadillo (≥ 0.7.500.0.0) | 
| VignetteBuilder: | knitr | 
| Language: | en-US | 
| Author: | Samuel I. Berchuck [aut, cre] | 
| Maintainer: | Samuel I. Berchuck <sib2@duke.edu> | 
| Packaged: | 2025-09-30 12:39:45 UTC; sib2 | 
| Repository: | CRAN | 
| Date/Publication: | 2025-09-30 13:00:02 UTC | 
spBFA
Description
spBFA is a package for Bayesian spatial factor analysis. A corresponding manuscript is forthcoming.
Author(s)
Samuel I. Berchuck sib2@duke.edu
Spatial factor analysis using a Bayesian hierarchical model.
Description
bfa_sp is a Markov chain Monte Carlo (MCMC) sampler for a Bayesian spatial factor analysis model. The spatial component is 
introduced using a Probit stick-breaking process prior on the factor loadings. The model is implemented using a Bayesian hierarchical framework.
Usage
bfa_sp(
  formula,
  data,
  dist,
  time,
  K,
  L = Inf,
  trials = NULL,
  family = "normal",
  temporal.structure = "exponential",
  spatial.structure = "discrete",
  starting = NULL,
  hypers = NULL,
  tuning = NULL,
  mcmc = NULL,
  seed = 54,
  gamma.shrinkage = TRUE,
  include.space = TRUE,
  clustering = TRUE
)
Arguments
| formula | A  | 
| data | A required  | 
| dist | A  | 
| time | A  | 
| K | A scalar that indicates the dimension (i.e., quantity) of latent factors. | 
| L | The number of latent clusters. If finite, a scalar indicating the number of clusters for each column of the factor loadings matrix. By default  | 
| trials | A variable in  | 
| family | Character string indicating the distribution of the observed data. Options
include:  | 
| temporal.structure | Character string indicating the temporal kernel. Options include:
 | 
| spatial.structure | Character string indicating the type of spatial process. Options include:
 | 
| starting | Either  When  | 
| hypers | Either  When  
 
 
 
 
 
 
 | 
| tuning | Either  When  | 
| mcmc | Either  
 
 
 
 | 
| seed | An integer value used to set the seed for the random number generator (default = 54). | 
| gamma.shrinkage | A logical indicating whether a gamma shrinkage process prior is used for the variances of the factor loadings columns. If FALSE, the hyperparameters (A1 and A2) indicate the shape and rate for a gamma prior on the precisions. Default is TRUE. | 
| include.space | A logical indicating whether a spatial process should be included. Default is TRUE, however if FALSE the spatial correlation matrix 
is fixed as an identity matrix. This specification overrides the  | 
| clustering | A logical indicating whether the Bayesian non-parametric process should be used, default is TRUE. If FALSE is specified each column is instead modeled with an independent spatial process. | 
Details
Details of the underlying statistical model proposed by Berchuck et al. 2019. are forthcoming.
Value
bfa_sp returns a list containing the following objects
- lambda
- NKeep x (M x O x K)- matrixof posterior samples for factor loadings matrix- lambda. The labels for each column are Lambda_O_M_K.
- eta
- NKeep x (Nu x K)- matrixof posterior samples for the latent factors- eta. The labels for each column are Eta_Nu_K.
- beta
- NKeep x P- matrixof posterior samples for- beta.
- sigma2
- NKeep x (M * (O - C))- matrixof posterior samples for the variances- sigma2. The labels for each column are Sigma2_O_M.
- kappa
- NKeep x ((O * (O + 1)) / 2)- matrixof posterior samples for- kappa. The columns have names that describe the samples within them. The row is listed first, e.g.,- Kappa3_2refers to the entry in row- 3, column- 2.
- delta
- NKeep x K- matrixof posterior samples for- delta.
- tau
- NKeep x K- matrixof posterior samples for- tau.
- upsilon
- NKeep x ((K * (K + 1)) / 2)- matrixof posterior samples for- Upsilon. The columns have names that describe the samples within them. The row is listed first, e.g.,- Upsilon3_2refers to the entry in row- 3, column- 2.
- psi
- NKeep x 1- matrixof posterior samples for- psi.
- xi
- NKeep x (M x O x K)- matrixof posterior samples for factor loadings cluster labels- xi. The labels for each column are Xi_O_M_K.
- rho
- NKeep x 1- matrixof posterior samples for- rho.
- metropolis
- 2 (or 1) x 3- matrixof metropolis acceptance rates, updated tuners, and original tuners that result from the pilot adaptation.
- runtime
- A - characterstring giving the runtime of the MCMC sampler.
- datobj
- A - listof data objects that are used in future- bfa_spfunctions and should be ignored by the user.
- dataug
- A - listof data augmentation objects that are used in future- bfa_spfunctions and should be ignored by the user.
References
Reference for Berchuck et al. 2019 is forthcoming.
Examples
###Load womblR for example visual field data
library(womblR)
###Format data for MCMC sampler
blind_spot <- c(26, 35) # define blind spot
VFSeries <- VFSeries[order(VFSeries$Location), ] # sort by location
VFSeries <- VFSeries[order(VFSeries$Visit), ] # sort by visit
VFSeries <- VFSeries[!VFSeries$Location %in% blind_spot, ] # remove blind spot locations
dat <- data.frame(Y = VFSeries$DLS / 10) # create data frame with scaled data
Time <- unique(VFSeries$Time) / 365 # years since baseline visit
W <- HFAII_Queen[-blind_spot, -blind_spot] # visual field adjacency matrix (data object from womblR)
M <- dim(W)[1] # number of locations
###Prior bounds for psi
TimeDist <- as.matrix(dist(Time))
BPsi <- log(0.025) / -min(TimeDist[TimeDist > 0])
APsi <- log(0.975) / -max(TimeDist)
###MCMC options
K <- 10 # number of latent factors
O <- 1 # number of spatial observation types
Hypers <- list(Sigma2 = list(A = 0.001, B = 0.001),
               Kappa = list(SmallUpsilon = O + 1, BigTheta = diag(O)),
               Delta = list(A1 = 1, A2 = 20),
               Psi = list(APsi = APsi, BPsi = BPsi),
               Upsilon = list(Zeta = K + 1, Omega = diag(K)))
Starting <- list(Sigma2 = 1,
                 Kappa = diag(O),
                 Delta = 2 * (1:K),
                 Psi = (APsi + BPsi) / 2,
                 Upsilon = diag(K))
Tuning <- list(Psi = 1)
MCMC <- list(NBurn = 1000, NSims = 1000, NThin = 2, NPilot = 5)
###Fit MCMC Sampler
reg.bfa_sp <- bfa_sp(Y ~ 0, data = dat, dist = W, time = Time,  K = 10, 
                     starting = Starting, hypers = Hypers, tuning = Tuning, mcmc = MCMC,
                     L = Inf,
                     family = "tobit",
                     trials = NULL,
                     temporal.structure = "exponential",
                     spatial.structure = "discrete",
                     seed = 54, 
                     gamma.shrinkage = TRUE,
                     include.space = TRUE,
                     clustering = TRUE)
###Note that this code produces the pre-computed data object "reg.bfa_sp"
###More details can be found in the vignette.
diagnostics
Description
Calculates diagnostic metrics using output from the spBFA model.
Usage
diagnostics(
  object,
  diags = c("dic", "dinf", "waic"),
  keepDeviance = FALSE,
  keepPPD = FALSE,
  Verbose = TRUE,
  seed = 54
)
Arguments
| object | A  | 
| diags | A vector of character strings indicating the diagnostics to compute. Options include: Deviance Information Criterion ("dic"), d-infinity ("dinf") and Watanabe-Akaike information criterion ("waic"). At least one option must be included. Note: The probit model cannot compute the DIC or WAIC diagnostics due to computational issues with computing the multivariate normal CDF. | 
| keepDeviance | A logical indicating whether the posterior deviance distribution is returned (default = FALSE). | 
| keepPPD | A logical indicating whether the posterior predictive distribution at each observed location is returned (default = FALSE). | 
| Verbose | A boolean logical indicating whether progress should be output (default = TRUE). | 
| seed | An integer value used to set the seed for the random number generator (default = 54). | 
Details
To assess model fit, DIC, d-infinity and WAIC are used. DIC is based on the deviance statistic and penalizes for the complexity of a model with an effective number of parameters estimate pD (Spiegelhalter et al 2002). The d-infinity posterior predictive measure is an alternative diagnostic tool to DIC, where d-infinity=P+G. The G term decreases as goodness of fit increases, and P, the penalty term, inflates as the model becomes over-fit, so small values of both of these terms and, thus, small values of d-infinity are desirable (Gelfand and Ghosh 1998). WAIC is invariant to parametrization and is asymptotically equal to Bayesian cross-validation (Watanabe 2010). WAIC = -2 * (lppd - p_waic_2). Where lppd is the log pointwise predictive density and p_waic_2 is the estimated effective number of parameters based on the variance estimator from Vehtari et al. 2016. (p_waic_1 is the mean estimator).
Value
diagnostics returns a list containing the diagnostics requested and
possibly the deviance and/or posterior predictive distribution objects.
Author(s)
Samuel I. Berchuck
References
Gelfand, A. E., & Ghosh, S. K. (1998). Model choice: a minimum posterior predictive loss approach. Biometrika, 1-11.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 583-639.
Vehtari, A., Gelman, A., & Gabry, J. (2016). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 1-20.
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11(Dec), 3571-3594.
Examples
###Load pre-computed regression results
data(reg.bfa_sp)
###Compute and print diagnostics
diags <- diagnostics(reg.bfa_sp)
print(unlist(diags))
is.spBFA
Description
is.spBFA is a general test of an object being interpretable as a
spBFA object.
Usage
is.spBFA(x)
Arguments
| x | object to be tested. | 
Details
The spBFA class is defined as the regression object that
results from the spBFA regression function.
Value
is.spBFA returns a logical, depending on whether the object is of class spBFA.
Examples
###Load pre-computed results
data(reg.bfa_sp)
###Test function
is.spBFA(reg.bfa_sp)
predict.spBFA
Description
Predicts future observations from the spBFA model.
Usage
## S3 method for class 'spBFA'
predict(
  object,
  NewTimes,
  NewX = NULL,
  NewTrials = NULL,
  type = "temporal",
  Verbose = TRUE,
  seed = 54,
  ...
)
Arguments
| object | A  | 
| NewTimes | A numeric vector including desired time(s) points for prediction. | 
| NewX | A matrix including covariates at times  | 
| NewTrials | An array indicating the trials for categorical predictions. The array must have dimension  | 
| type | A character string indicating the type of prediction, choices include "temporal" and "spatial". Spatial prediction has not been implemented yet. | 
| Verbose | A boolean logical indicating whether progress should be output. | 
| seed | An integer value used to set the seed for the random number generator (default = 54). | 
| ... | other arguments. | 
Details
predict.spBFA uses Bayesian krigging to predict vectors at future
time points. The function returns the krigged factors (Eta) and also the observed outcomes (Y).
Value
predict.spBFA returns a list containing the following objects.
- Eta
- A - listcontaining- NNewVistismatrices, one for each new time prediction. Each matrix is dimension- NKeep x K, where- Kis the number of latent factors Each matrix contains posterior samples obtained by Bayesian krigging.
- Y
- A - listcontaining- NNewVistisposterior predictive distribution matrices. Each matrix is dimension- NKeep x (M * O), where- Mis the number of spatial locations and- Othe number of observation types. Each matrix is obtained through Bayesian krigging.
Author(s)
Samuel I. Berchuck
Examples
###Load pre-computed regression results
data(reg.bfa_sp)
###Compute predictions
pred <- predict(reg.bfa_sp, NewTimes = 3)
pred.observations <- pred$Y$Y10 # observed data predictions
pred.krig <- pred$Eta$Eta10 # krigged parameters
Pre-computed regression results from bfa_sp
Description
The data object reg.bfa_sp is a pre-computed spBFA data object for use in the package vignette and function examples.
Usage
data(reg.bfa_sp)
Format
The data object reg.bfa_sp is a spBFA data object that is the result of implementing the MCMC code in the vignette.
It is presented here because the run-time would be unecessarily long when compiling the R package.