| Title: | Mixture-of-Experts Wishart Models for Covariance Data |
| Version: | 1.0 |
| Description: | Methods for maximum likelihood and Bayesian estimation for the Wishart mixture model and the mixture-of-experts Wishart (MoE-Wishart) model. The package provides four inference algorithms for these models, each implemented using the expectation–maximization (EM) algorithm for maximum likelihood estimation and a fully Bayesian approach via Gibbs-within-Metropolis–Hastings sampling. |
| Maintainer: | Zhi Zhao <zhi.zhao@medisin.uio.no> |
| URL: | https://github.com/zhizuio/moewishart |
| BugReports: | https://github.com/zhizuio/moewishart/issues |
| License: | GPL-3 |
| VignetteBuilder: | knitr |
| Depends: | R (≥ 4.1.0) |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| Imports: | utils, stats, loo |
| Suggests: | rmarkdown, knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-02-16 10:10:46 UTC; zhiz |
| Author: | The Tien Mai [aut], Zhi Zhao [aut, cre] |
| Repository: | CRAN |
| Date/Publication: | 2026-02-19 19:50:02 UTC |
Information criteria for Wishart mixtures and MoE models
Description
Compute AIC, BIC, and ICL for EM fits; and PSIS-LOO expected
log predictive density (elpd_loo) for Bayesian fits. Supports
mixturewishart (finite mixture) and moewishart (MoE with
covariates in gating).
Usage
computeIC(fit)
Arguments
fit |
A fitted object returned by |
Details
For EM fits:
AIC:
\mathrm{AIC} = 2k - 2\ell, wherekis the number of free parameters and\ellis the maximized log-likelihood (last EM iteration).BIC:
\mathrm{BIC} = k \log n - 2\ell.ICL:
\mathrm{ICL} = \mathrm{BIC} + \sum_{i=1}^n \sum_{k=1}^K \tau_{ik}\log \tau_{ik}, i.e., BIC plus the entropy term (classification likelihood approximation).
Parameter counting k:
For
mixturewishart:k = (K-1) + K \cdot \frac{p(p+1)}{2} + K \cdot \mathbb{I}[\text{estimate }\nu], where(K-1)are mixture weights, each\Sigma_khas\frac{p(p+1)}{2}free parameters, and\nu_kadds 1 per component when estimated.For
moewishart:k = q\,(K-1) + K \cdot \frac{p(p+1)}{2} + K \cdot \mathbb{I}[\text{estimate }\nu], whereq\,(K-1)are gating regression coefficients (last class is reference with zero column).
For Bayesian fits:
elpd_loo: computed via
loo::loousing per-observation log-likelihood draws. Requiresfit$loglik_individualas a matrix of sizeS \times n(draws by observations), as produced by the provided samplers. Returnselpd_loo,se_elpd_loo,p_loo, andlooic.
Notes:
AIC/BIC/ICL are defined for MLE (EM) fits. For Bayesian fits, prefer elpd_loo (or WAIC) rather than AIC/BIC.
The entropy term in ICL uses EM responsibilities
\tau_{ik}(fieldtauformixturewishart,gammaformoewishart).
Value
- For method="em": a list with fields AIC, BIC, ICL.
- For method="bayes": a list with fields ICL and elpd of class
'"loo"' as returned by [loo::loo()] that contains fields 'estimates',
'pointwise', 'diagnostics'
Examples
# Bayesian example (MoE)
# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
# True gating coefficients (last column zero)
set.seed(123)
Xq <- 3
K <- 3
betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K)
betas[, K] <- 0
dat <- simData(n, p,
Xq = 3, K = 3, betas = betas,
pis = c(0.35, 0.40, 0.25),
nus = c(8, 16, 3)
)
set.seed(123)
fit <- moewishart(
dat$S,
X = cbind(1, dat$X), K = 3,
mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K)
mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1)
niter = 100, burnin = 50
)
computeIC(fit)
density of Wishart distribution
Description
Compute the (log) density of a p-dimensional Wishart distribution
W_p(\nu, \Sigma) at an SPD matrix S. Returns either the
log-density or the density depending on logarithm.
Usage
dWishart(S, nu, Sigma, detS_val = NULL, logarithm = TRUE)
Arguments
S |
Numeric |
nu |
Numeric. Degrees of freedom |
Sigma |
Numeric |
detS_val |
Optional numeric. Precomputed |
logarithm |
Logical. If |
Details
Let S \sim W_p(\nu, \Sigma) with degrees of freedom \nu
and scale matrix \Sigma (SPD). The density is:
f(S \mid \nu, \Sigma) =
\frac{|S|^{(\nu - p - 1)/2}\,
\exp\{-\tfrac{1}{2}\,\mathrm{tr}(\Sigma^{-1}S)\}}
{2^{\nu p/2}\,|\Sigma|^{\nu/2}\,\Gamma_p(\nu/2)},
where \Gamma_p(\cdot) is the multivariate gamma function and
p is the dimension.
Note that
(i) detS_val can be supplied to avoid recomputing \log|S|,
which is useful inside EM/MCMC loops, and
(ii) small diagonal jitter is added internally to S and \Sigma
when computing determinants or solves for numerical stability.
Constraints: (i) S and \Sigma must be SPD, and (ii) the Wishart
requires \nu > p - 1.
Value
A numeric scalar: the log-density if logarithm = TRUE,
otherwise the density.
Examples
set.seed(123)
p <- 3
# Construct an SPD Sigma
A <- matrix(rnorm(p * p), p, p)
Sigma <- crossprod(A) + diag(p) * 0.5
# Draw a Wishart matrix using base stats::rWishart()
W <- drop(rWishart(1, df = p + 5, Sigma = Sigma))
# Evaluate log-density at W
dWishart(W, nu = p + 5, Sigma = Sigma, logarithm = TRUE)
Log multivariate gamma function
Description
Compute the log of the multivariate gamma function \log \Gamma_p(a)
for dimension p and parameter a.
Usage
lmvgamma(a, p)
Arguments
a |
Numeric. Argument of |
p |
Integer. Dimension |
Details
The multivariate gamma function \Gamma_p(a) is defined by:
\Gamma_p(a)
= \pi^{\,p(p-1)/4} \prod_{j=1}^{p} \Gamma\!\left(a + \frac{1-j}{2}\right).
Constraints: (i) p \in \{1,2,\dots\} (positive integer), and
(ii) a > (p-1)/2 to keep all gamma terms finite (as in the Wishart
normalization constant).
Value
A numeric scalar equal to \log \Gamma_p(a).
Examples
# Dimension
p <- 3
# Evaluate log multivariate gamma at a = nu/2
nu <- p + 5
lmvgamma(a = nu / 2, p = p)
EM/Bayesian estimation for Wishart mixture model
Description
Fit finite mixtures of Wishart-distributed SPD matrices using either a
Bayesian sampler or the EM algorithm. The input S_list is a list
of p \times p SPD matrices. Under component k,
S_i \mid z_i=k \sim W_p(\nu_k, \Sigma_k) with degrees of freedom
\nu_k and SPD scale matrix \Sigma_k. Mixture weights
\pi_k sum to 1.
Usage
mixturewishart(
S_list,
K,
niter = 3000,
burnin = 1000,
method = "bayes",
thin = 1,
alpha = NULL,
nu0 = NULL,
Psi0 = NULL,
init_pi = NULL,
init_nu = NULL,
init_Sigma = NULL,
marginal.z = TRUE,
estimate_nu = TRUE,
nu_prior_a = 2,
nu_prior_b = 0.1,
mh_sigma = 1,
n_restarts = 3,
restart_iters = 20,
tol = 1e-06,
verbose = TRUE
)
Arguments
S_list |
List of length |
K |
Integer. Number of mixture components. |
niter |
Integer. Total iterations. Bayesian mode: total MCMC
iterations (including burn-in). EM mode: maximum EM iterations
(alias to |
burnin |
Integer. Number of burn-in iterations (Bayesian mode). |
method |
Character; one of |
thin |
Integer. Thinning interval for saving draws (Bayesian). |
alpha |
Numeric vector length |
nu0 |
Numeric. Inverse-Wishart prior df for |
Psi0 |
Numeric |
init_pi |
Optional numeric vector length |
init_nu |
Optional numeric vector length |
init_Sigma |
Optional list of |
marginal.z |
Logical. If |
estimate_nu |
Logical. If |
nu_prior_a |
Numeric. Prior hyperparameter |
nu_prior_b |
Numeric. Prior hyperparameter |
mh_sigma |
Numeric scalar or length- |
n_restarts |
Integer. Number of random restarts for EM. Ignored in Bayesian mode. |
restart_iters |
Integer. Number of short EM iterations per restart used to select a good initialization. Ignored in Bayesian mode. |
tol |
Numeric. Convergence tolerance on absolute change of log-likelihood (EM), also used internally elsewhere. |
verbose |
Logical. If |
Details
Mixture mixture model:
p(S_i) = \sum_{k=1}^K \pi_k \, f_W(S_i \mid \nu_k, \Sigma_k).
Algorithms:
-
method = "bayes": Samples latent labelsz, weights\pi, component scales\Sigma_k, and optionally\nu_k. Uses a Dirichlet prior for\pi, inverse- Wishart prior for\Sigma_k, and a prior on\nu_kwhenestimate_nu = TRUE. Degrees-of-freedom are updated via MH on\log(\nu_k)with proposal sdmh_sigma. Can integrate out\piwhen samplingzifmarginal.z = TRUE. -
method = "em": Maximizes the observed-data log- likelihood via EM. The E-step computes responsibilities via Wishart log-densities. The M-step updates\pi_kand\Sigma_k; optionally updates\nu_kwhenestimate_nu = TRUE. Supports multiple random restarts.
Note that
(i) All matrices in S_list must be SPD. Small ridge terms may be
added internally for stability, and
(ii) Multiple EM restarts are recommended for robustness on difficult datasets.
Value
A list whose structure depends on method:
For
method = "bayes":-
pi_ik: array (nsavexnxK), saved per-observation weights. -
pi: matrix (nsavexK), saved mixture proportions. -
nu: matrix (nsavexK), saved degrees- of-freedom. -
Sigma: list of lengthnsave; each is an array (p \times p \times K) of\Sigma_kdraws. -
z: matrix (nsavexn), saved allocations. -
sigma_posterior_mean: array (p \times p \times K), posterior mean of\Sigma_k. -
loglik: numeric vector (lengthniter), log- likelihood trace. -
loglik_individual: matrix (niterxn), per-observation log-likelihood.
-
For
method = "em":-
pi: numeric vector lengthK, mixture proportions. -
Sigma: list lengthK, each ap \times pSPD matrix. -
nu: numeric vector lengthK, degrees-of- freedom. -
tau: matrix (n \times K), responsibilities. -
loglik: numeric vector, log-likelihood per EM iteration. -
iterations: integer, number of EM iterations performed.
-
Examples
# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
dat <- simData(n, p,
K = 3,
pis = c(0.35, 0.40, 0.25),
nus = c(8, 16, 3)
)
set.seed(123)
fit <- mixturewishart(
dat$S,
K = 3,
mh_sigma = c(0.2, 0.1, 0.1), # tune this for MH acceptance 20-40%
niter = 100, burnin = 50
)
# Posterior means for degrees of freedom of Wishart distributions:
nu_mcmc <- fit$nu[-c(1:fit$burnin), ]
colMeans(nu_mcmc)
EM/Bayesian estimation for Wishart MoE model
Description
Fit a mixture-of-experts model for symmetric positive-definite (SPD) matrices with covariate-dependent mixing proportions (gating network). Components are Wishart-distributed. Supports Bayesian sampling and EM-based maximum-likelihood estimation.
Usage
moewishart(
S_list,
X,
K,
niter = 3000,
burnin = 1000,
method = "bayes",
thin = 1,
nu0 = NULL,
Psi0 = NULL,
init_nu = NULL,
estimate_nu = TRUE,
nu_prior_a = 2,
nu_prior_b = 0.1,
mh_sigma = 0.1,
mh_beta = 0.05,
sigma_beta = 10,
init = NULL,
tol = 1e-06,
ridge = 1e-08,
verbose = TRUE
)
Arguments
S_list |
List of length |
X |
Numeric matrix |
K |
Integer. Number of mixture components (experts). |
niter |
Integer. Total iterations. Bayesian mode: total MCMC iterations (including burn-in). EM mode: maximum EM iterations. |
burnin |
Integer. Number of burn-in iterations (Bayesian mode). |
method |
Character; one of |
thin |
Integer. Thinning interval for saving draws (Bayesian). |
nu0 |
Numeric. Inverse-Wishart prior df for |
Psi0 |
Numeric |
init_nu |
Optional numeric vector length |
estimate_nu |
Logical. If |
nu_prior_a |
Numeric. Prior hyperparameter |
nu_prior_b |
Numeric. Prior hyperparameter |
mh_sigma |
Numeric scalar or length- |
mh_beta |
Numeric scalar or length- |
sigma_beta |
Numeric. Prior sd of the Gaussian prior on |
init |
Optional list with fields for EM initialization, e.g.,
|
tol |
Numeric. Convergence tolerance on absolute change of log-likelihood (EM), also used internally. |
ridge |
Numeric. Small diagonal ridge added to |
verbose |
Logical. If |
Details
MoE-Wishart Model:
Observation:
S_iis ap \times pSPD matrix. Given allocationz_i=k,S_i \mid z_i \sim W_p(\nu_k, \Sigma_k)with df\nu_kand scale\Sigma_k.Gating (MoE): Let
X_ibeq-dimensional covariates. Mixing weights\pi_{ik} = \Pr(z_i=k \mid X_i)follow a softmax regression:\pi_{ik} = \exp(\eta_{ik})/\sum_{j=1}^K \exp(\eta_{ij}), where\eta_i = X_i^\top B,Bisq \times K. Identifiability: last column ofBis fixed to zero.
Algorithms:
Bayesian (
method = "bayes"): Metropolis-within-Gibbs sampler forz,\Sigma_k, optional\nu_k, andB. Gaussian priors onBwith sdsigma_beta. Proposals usemh_sigmafor\log(\nu_k)andmh_betaforB.EM (
method = "em"): E-step responsibilities using Wishart log-densities and softmax gating. M-step updates\Sigma_k, optional\nu_k, andBvia weighted multinomial logistic regression (BFGS).
Note that:
(i) include an intercept column in X; none is added by default, and
(ii) all S_list elements must be SPD. A small ridge may be
added for stability.
Value
A list whose fields depend on method:
For
method = "bayes":-
Beta_samples: array (nsavexqxK), saved draws ofB(last column zero). -
nu_samples: matrix (nsavexK), draws of\nu_k. -
Sigma_samples: list of lengthnsave; each element is an array (p \times p \times K) of\Sigma_kdraws. -
z_samples: matrix (nsavexn), draws of allocations. -
pi_ik: array (nsavexnxK), per-observation gating probabilities. -
pi_mean: matrix (nxK), posterior mean of gating probabilities. -
loglik: numeric vector (lengthniter), log-likelihood trace. -
loglik_individual: matrix (niterxn), per-observation log-likelihood.
-
For
method = "em":-
K, p, q, n: problem dimensions. -
Beta: matrix (q \times K), gating coefficients with last column zero (reference class). -
Sigma: list lengthK, each ap \times pSPD matrix (scale). -
nu: numeric vector lengthK, degrees of freedom. -
gamma: matrix (n \times K), final responsibilities. -
loglik: numeric vector, log-likelihood by EM iteration. -
iter: integer, number of EM iterations performed.
-
Examples
# simulate data
set.seed(123)
n <- 500 # subjects
p <- 2
# True gating coefficients (last column zero)
set.seed(123)
Xq <- 3
K <- 3
betas <- matrix(runif(Xq * K, -2, 2), nrow = Xq, ncol = K)
betas[, K] <- 0
dat <- simData(n, p,
Xq = 3, K = 3, betas = betas,
pis = c(0.35, 0.40, 0.25),
nus = c(8, 16, 3)
)
set.seed(123)
fit <- moewishart(
dat$S,
X = cbind(1, dat$X), K = 3,
mh_sigma = c(0.2, 0.1, 0.1), # RW-MH variances (length K)
mh_beta = c(0.2, 0.2), # RW-MH variances (length K-1)
niter = 500, burnin = 200
)
# Posterior means for degrees of freedom of Wishart distributions:
nu_mcmc <- fit$nu[-c(1:fit$burnin), ]
colMeans(nu_mcmc)
Dirichlet random sampling
Description
Generate random draws from a Dirichlet distribution with parameter
vector \alpha \in \mathbb{R}_+^K. Each draw is a length-K
probability vector on the simplex.
Usage
rdirichlet(n, alpha)
Arguments
n |
Integer. Number of independent Dirichlet draws to generate. |
alpha |
Numeric vector of positive concentration parameters
|
Details
Definition:
If Y_k \sim \mathrm{Gamma}(\alpha_k, 1) independently for
k=1,\dots,K (shape \alpha_k, rate 1), then the
normalized vector X_k = Y_k / \sum_{j=1}^K Y_j follows
\mathrm{Dirichlet}(\alpha).
Note that
alpha must be a numeric vector with strictly positive entries.
Value
A numeric matrix of size n \times K, where each row is
an independent Dirichlet draw that sums to 1.
Examples
set.seed(123)
# 3-dimensional Dirichlet with asymmetric concentration
alpha <- c(2, 5, 3)
rdirichlet(5, alpha)
Fast sampler for the inverse-Wishart distribution
Description
Draw a random sample from an inverse-Wishart distribution
\mathcal{IW}_p(\nu, \Psi) using the identity
S \sim \mathcal{IW}_p(\nu, \Psi) iff
S^{-1} \sim W_p(\nu, \Psi^{-1}). This implementation accepts
\Psi^{-1} directly for speed.
Usage
sampleIW(nu, Psi_inv)
Arguments
nu |
Numeric. Degrees of freedom |
Psi_inv |
Numeric |
Details
Sampling scheme:
Sample
W \sim W_p(\nu, \Psi^{-1})usingrWishart.Return
S = W^{-1}, which has\mathcal{IW}_p(\nu, \Psi).
Parameterization:
-
\nuis the degrees of freedom, must satisfy\nu > p - 1. -
\Psiis the SPD scale matrix of the inverse-Wishart. This function expects its inverse\Psi^{-1}as input for efficiency (avoids repeated matrix inversions).
Note that:
(i) internally calls rWishart(1, df = nu, Sigma = Psi_inv), and
(ii) returns solve(W); if numerical issues arise, consider
adding a small ridge to \Psi^{-1} prior to sampling.
Value
A p \times p SPD matrix S distributed as
\mathcal{IW}_p(\nu, \Psi).
Examples
set.seed(123)
p <- 3
# Construct an SPD scale matrix Psi
A <- matrix(rnorm(p * p), p, p)
Psi <- crossprod(A) + diag(p) * 0.5
Psi_inv <- solve(Psi)
# Draw one inverse-Wishart sample
S <- sampleIW(nu = p + 5, Psi_inv = Psi_inv)
S
Simulate data from a Wishart mixture or mixture-of-experts model
Description
Generate synthetic SPD matrices from either: (i) a finite mixture of Wishart components with fixed mixing proportions, or (ii) a mixture-of-experts (MoE) where mixing proportions depend on covariates via a softmax gating model.
Usage
simData(
n = 200,
p = 2,
Xq = 0,
K = NA,
betas = NULL,
pis = c(0.4, 0.6),
nus = c(8, 12),
Sigma = NULL
)
Arguments
n |
Integer. Number of observations to simulate. |
p |
Integer. Dimension of the Wishart distribution (matrix size
|
Xq |
Integer. Number of covariates for the gating network
(MoE case). If |
K |
Integer. Number of latent components. Required when
|
betas |
Numeric matrix |
pis |
Numeric vector of length |
nus |
Numeric vector length |
Sigma |
Optional list length |
Details
Models:
Fixed mixture (no covariates,
Xq = 0):z_i \sim \mathrm{Categorical}(\pi), andS_i \mid z_i=k \sim W_p(\nu_k, \Sigma_k).Mixture-of-experts (covariates,
Xq > 0): LetX_i \in \mathbb{R}^{Xq}. The mixing weights are\pi_{ik} = \Pr(z_i=k \mid X_i)given by softmax regression\pi_{ik} = \exp(X_i^\top B_k) / \sum_{j=1}^K \exp(X_i^\top B_j). Labelsz_iare drawn from\mathrm{Categorical}(\pi_i)andS_i \mid z_i=k \sim W_p(\nu_k, \Sigma_k).
Simulation steps:
Construct
pis:If
Xq = 0, replicate the providedpisovernrows.If
Xq > 0, generateX~ N(0, I) and compute softmax probabilities usingbetas(last column set to zero by default identifiability).
If
Sigmais not provided, create default\Sigma_kmatrices (SPD) depending onKandp.Sample labels
z_i \sim \mathrm{Categorical}(\pi_i).Draw
S_ifromW_p(\nu_{z_i}, \Sigma_{z_i})viarWishart.
Note that:
(i) in the MoE case, no intercept is automatically added to X.
Use Xq to include desired covariates; the default
betas is randomly generated with betas[, K] = 0, and
(ii) provided Sigma must be a list of SPD p \times p
matrices. Provided nus must exceed p - 1.
Value
A list with the following elements:
-
S: list of lengthnof simulated SPD matricesS_i. -
z: integer vector lengthnof component labels. -
nu: numeric vector lengthKof degrees of freedom. -
pi: matrixn \times Kof mixing probabilities (rows sum to1). -
Sigma_list: list lengthKof the scale matrices used for simulation. -
X: matrixn \times Xqof covariates ifXq > 0, otherwiseNULL. -
betas: the gating coefficient matrix used whenXq > 0, otherwiseNULL.
Examples
# simulate data from mixture model (no covariates)
set.seed(123)
n <- 200 # subjects
p <- 10
dat <- simData(n, p,
K = 3,
pis = c(0.35, 0.40, 0.25),
nus = c(8, 12, 3)
)
str(dat)