The ksm package provides functionalities for the kernel
density estimation for random symmetric positive definite matrices. It
provides C++ wrappers for estimating the density, with additional
functionalities for estimating the optimal bandwidth according to the
least square cross validation (LSCV) and the leave-one-out cross (LCV)
validation criteria. Three kernels are implemented: the Gaussian
(smnorm), log-Gaussian (smlnorm) and Wishart
(Wishart).
set.seed(2025)
library(ksm)
#>
#> Attaching package: 'ksm'
#> The following object is masked from 'package:stats':
#>
#> rWishart
d <- 4L
# Equicorrelation matrix
S <- diag(rep(0.5, d)) + matrix(0.5, d, d)
# Generate simulated data from an inverse Wishart (d by d by n array).
samp <- rinvWishart(n = 100, df = 10, S)
dim(samp)
#> [1] 4 4 100
# Optimize the bandwidth for kernel using leave-one-out cross validation
b <- bandwidth_optim(
x = samp,
kernel = "smlnorm",
criterion = "lcv")
# Evaluate kernel with a new psd matrix data point
newsamp <- rinvWishart(n = 2, df = 10, S = S)
# Kernel density (default to log scale)
kdens_symmat(
x = newsamp,
xs = samp,
kernel = "smlnorm",
b = b)
#> [1] 15.710544 3.460046We can also consider stationary data and exclude observations at lag \(h\) to account for serial dependence. This is illustrated below with a time series of realized variance covariance matrix between two assets over 250 days.
data(realvar, package = "ksm")
dim(realvar)
#> [1] 2 2 250
# Select suitable lag for least square cross validation
h <- ceiling(dim(realvar)[3]^0.25)
# Calculate optimal bandwidth
bopt <- bandwidth_optim(
x = realvar,
kernel = "Wishart",
criterion = "lscv",
h = h)
# Evaluate at multiple values of bandwidth
bseq <- seq(0.5 * bopt, 2 * bopt, length.out = 101)
lscv <- ksm::lscv_kdens_symmat(
x = realvar,
b = bseq,
h = h,
kernel = "Wishart")
with(lscv,
plot(x = b,
y = lscv,
type = "l",
panel.first = {
abline(v = bopt, lty = 2, col = "grey")
},
xlab = "bandwidth",
ylab = "least square cross validation"
)
)We can also make integration in dimensions \(d \in \{2, 3\}\) to evaluate numerically
the performance of kernels on simulated data. The function
integrate_spd performs a transformation of inputs for
numerical integration with routines from the cubature
package. This is useful for checking correct implementation: below, we
check that the density of a Wishart integrates to unity over the space
of positive definite matrices.
Different methods for integration are provided: we recommend in
particular cuhre, suave or
hcubature. The lb argument controls the
smaller value for the eigenvalues, which sometimes returns matrices that
are close to being non-positive definite. If such an error mistake
arises, consider setting lb to a small value.
# Check that Wishart density integrates to unity
(int <- integrate_spd(
dim = 2L,
neval = 1e5L,
f = function(x, S){
dWishart(x, df = 10, S = S, log = FALSE)},
S = diag(2),lb = 0, method = "hcubature"))
#> $integral
#> [1] 0.9999433
#>
#> $error
#> [1] 0.0009986894
#>
#> $neval
#> [1] 30723
#>
#> $returnCode
#> [1] 0
isTRUE(all.equal(int$integral, 1, tol = 2*int$error))
#> [1] TRUEThe main use for the integrate_spd function is to
calculate the integrated squared error for data simulated from some
density. We do this here for different models that generate independent
and identically distributed matrices from a mixture of Wishart
densities. The function here relies on predefined models from functions
simu_rdens and simu_fdens used in the
simulation studies of the paper for the independent and identically
distributed scenario.
# Integrated squared error
ISE <- function(
S,
x,
bandwidth,
model = 1:6,
kernel = c("Wishart", "smlnorm", "smnorm"),
...
) {
model <- match.arg(as.character(model), choices = 1:6)
kernel <- match.arg(kernel)
dim <- ncol(x)
# Compute the squared difference between the kernel and the target density
(
ksm::kdens_symmat(
x = S,
xs = x,
b = bandwidth,
kernel = kernel,
log = FALSE
) -
ksm::simu_fdens(S, model = model, d = dim)
)^2
}
# Calculate numerical integral
d <- 2L
xs <- simu_rdens(n = 100, model = 2, d = d)
b <- bandwidth_optim(
x = xs,
criterion = "lcv",
kernel = "smnorm")
# Computing ISE
ISE(
S = xs[,,1, drop = FALSE],
x = xs,
bandwidth = b,
model = 2,
kernel = "smnorm")
#> [1] 8.525459e-05
# Next integrate over the space of psd matrices
ISE_int <- try(
ksm::integrate_spd(
f = function(S) {
ISE(
S,
x = xs,
kernel = "smnorm", # (only for lcv)
bandwidth = b,
model = 2
)
},
dim = d,
method = "hcubature",
lb = 0,
ub = Inf,
neval = 1e6L
),
silent = TRUE
)
ISE_int
#> $integral
#> [1] 0.002425163
#>
#> $error
#> [1] 2.42419e-06
#>
#> $neval
#> [1] 66561
#>
#> $returnCode
#> [1] 0