Bayesian Model Averaging with ‘LatentBMA’

This vignette provides an overview of the R package LatentBMA, which implements Bayesian model averaging (BMA) algorithms for univariate link latent Gaussian models (ULLGMs). For detailed information, refer to “Steel M.F.J. & Zens G. (2024). Model Uncertainty in Latent Gaussian Models with Univariate Link Function”. The package supports various g-priors and a beta-binomial prior on the model space. It also includes auxiliary functions for visualizing and tabulating BMA results. Currently, it offers an easy ‘out-of-the-box’ solution for model averaging of Poisson log-normal (PLN) and binomial logistic-normal (BiL) models. The codebase is designed to be easily extendable to other likelihoods, priors, and link functions.

Model Uncertainty in Poisson Log-Normal Regression Models

Consider a Poisson log-normal regression model of the form \(y_i \sim \mathcal{P}(e^{z_i})\) where \(z_i = \alpha + x_i'\beta + \epsilon_i\) and \(\epsilon_i \sim \mathcal{N}(0, \sigma^2)\). We simulate data with \(n=100\) observations and \(p=20\) covariates, where the first two covariates are relevant to the outcome, setting \(\alpha=2\) and \(\sigma^2=0.25\).

set.seed(123) # Ensure reproducibility
X  <- matrix(rnorm(100*20), 100, 20)
z  <- 2 + X %*% c(0.5, -0.5, rep(0, 18)) + rnorm(100, 0, sqrt(0.25))
y  <- rpois(100, exp(z))

The following code loads the package and runs a BMA MCMC algorithm with the default prior setup, which is a BRIC prior with \(m=p/2\), see the package manual for more details. Alternative choices of priors are also documented in the package manual.

library(LatentBMA)
results <- ULLGM_BMA(X = X, y = y, model = "PLN")

Estimating a Binomial Logistic-Normal Regression Model

To estimate a binomial logistic-normal (BiL) model of the form \(y_i \sim \mathcal{Bin}(N_i, 1/(1+e^{-z_i}))\), where \(z_i = \alpha + x_i'\beta + \epsilon_i\) and \(\epsilon_i \sim \mathcal{N}(0, \sigma^2)\), and \(N_i\) is the number of trials for each observation, one can use a similar syntax. The following code simulates data with \(n=100\) observations and \(p=20\) covariates, assuming \(N_i = 50\) trials for each observation, with the first two covariates relevant to the outcome, setting \(\alpha=1\) and \(\sigma^2=0.25\):

set.seed(123) # Ensure reproducibility
X  <- matrix(rnorm(100*20), 100, 20)
Ni <- rep(50, 100)
z  <- 1 + X %*% c(0.5, -0.5, rep(0, 18)) + rnorm(100, 0, sqrt(0.25))
y  <- rbinom(100, Ni, 1/(1+exp(-z)))

The corresponding ULLGM-BiL model in LatentBMA can be called using a similar command as before, changing only the model parameter and specifying the number of trials \(N_i\):

results <- ULLGM_BMA(X=X, y=y, Ni=Ni, model = "BiL")

Summarizing the Estimation Output

To summarize the posterior output in a table, one can use LatentBMA::summarizeBMA(). Note that all functions in LatentBMA that generate tables support LaTeX and HTML output. summarizeBMA() outputs a knitr::kable object which can be fully customized. The algorithm correctly identifies the first two predictors as the most relevant, as can be seen from the column with posterior inclusion probabilities.

summaryBMA(results)

Variable	Posterior Mean	Posterior SD	PIP
Intercept	1.066	0.051	-
x1	0.410	0.058	1.000
x2	-0.521	0.052	1.000
x3	0.001	0.008	0.014
x4	-0.002	0.014	0.039
x5	0.000	0.007	0.016
x6	-0.001	0.008	0.017
x7	0.000	0.006	0.014
x8	-0.004	0.022	0.046
x9	0.000	0.006	0.011
x10	0.001	0.012	0.026
x11	0.004	0.021	0.051
x12	0.000	0.004	0.009
x13	-0.002	0.015	0.028
x14	0.001	0.012	0.025
x15	-0.001	0.009	0.014
x16	-0.002	0.014	0.029
x17	0.000	0.007	0.019
x18	0.000	0.006	0.015
x19	0.003	0.018	0.042
x20	-0.001	0.011	0.026
sigma^2	0.138	0.037	-
g	400.000	0.000	-
Model Size	2.442	0.692	-

To extract the top models and the corresponding posterior model probabilities (PMPs) from the regression output, LatentBMA::topModels() can be used. In this simple setting, the algorithm strongly concentrates on the true model with two included predictors.

topModels(results)

model	Model #1	Model #2	Model #3	Model #4	Model #5
x1	x	x	x	x	x
x2	x	x	x	x	x
x4					x
x8			x
x11		x
x19				x
PMP	0.656	0.031	0.028	0.026	0.024

Several commands are available to visually summarize the results. To view the posterior distribution of model size, one can use LatentBMA::plotModelSize(). All plotting functions in LatentBMA output a ggplot2::ggplot object, which can be fully customized.

plotModelSize(results)

The estimated posterior inclusion probabilities and posterior means using LatentBMA::plotBeta() and LatentBMA::plotPIP() can be visualized as follows:

plotBeta(results)

plotPIP(results)

In order to assess the convergence of the algorithm, it can be useful to examine posterior traceplots. The function LatentBMA::tracePlot() provides functionality to generate traceplots for the parameters and the size of the visited models. For example, to look at the traceplots of \(\alpha\) and \(\sigma^2\), one can use the following code.

tracePlot(results, parameter = "alpha")

tracePlot(results, parameter = "sigma2")

Bayesian Model Averaging with ‘LatentBMA’

G. Zens

2025-04-08

Model Uncertainty in Poisson Log-Normal Regression Models

Estimating a Binomial Logistic-Normal Regression Model

Summarizing the Estimation Output

Further Customization