Help for package SLGP

Type:

Package

Title:

Spatial Logistic Gaussian Process for Field Density Estimation

Version:

1.0.0

Maintainer:

Athénaïs Gautier <athenais.gautier@onera.fr>

Description:

Provides tools for conditional and spatially dependent density estimation using Spatial Logistic Gaussian Processes (SLGPs). The approach represents probability densities through finite-rank Gaussian process priors transformed via a spatial logistic density transformation, enabling flexible non-parametric modeling of heterogeneous data. Functionality includes density prediction, quantile and moment estimation, sampling methods, and preprocessing routines for basis functions. Applications arise in spatial statistics, machine learning, and uncertainty quantification. The methodology builds on the framework of Leonard (1978) <doi:10.1111/j.2517-6161.1978.tb01655.x>, Lenk (1988) <doi:10.1080/01621459.1988.10478625>, Tokdar (2007) <doi:10.1198/106186007X210206>, Tokdar (2010) <doi:10.1214/10-BA605>, and is further aligned with recent developments in Bayesian non-parametric modelling: see Gautier (2023) https://boristheses.unibe.ch/4377/, and Gautier (2025) <doi:10.48550/arXiv.2110.02876>).

License:

GPL (≥ 3)

Encoding:

UTF-8

RoxygenNote:

7.3.2

Biarch:

true

Depends:

R (≥ 3.5.0), stats

Imports:

DiceDesign, methods, mvnfast, Rcpp (≥ 0.12.0), RcppParallel (≥ 5.0.1), rstan (≥ 2.18.1), GoFKernel, rstantools

LinkingTo:

BH (≥ 1.66.0), Rcpp (≥ 0.12.0), RcppEigen (≥ 0.3.3.3.0), rstan (≥ 2.18.1), StanHeaders (≥ 2.21.0)

SystemRequirements:

GNU make

Suggests:

knitr, rmarkdown, tidyr, dplyr, ggplot2, ggpubr, viridis, MASS

VignetteBuilder:

knitr

NeedsCompilation:

yes

Packaged:

2025-09-01 13:38:28 UTC; agautier

Author:

Athénaïs Gautier [aut, cre]

Repository:

CRAN

Date/Publication:

2025-09-05 20:50:02 UTC

SLGP: A package for spatially dependent probability distributions

Description

The SLGP package implements Spatial Logistic Gaussian Processes (SLGP) for the flexible modeling of conditional and spatially dependent probability distributions. The SLGP framework leverages basis-function expansions and sample-based inference (e.g., MAP, Laplace, MCMC) for efficient density estimation and uncertainty quantification. This package includes functionality to define, train, and sample from SLGP models, as well as visualization and diagnostic tools.

SLGP functions

The core functions in the package include:

slgp: trains an SLGP model from formula, data, and hyperparameters.
predictSLGP_moments: computes posterior predictive means and variances.
predictSLGP_quantiles: computes posterior predictive quantiles.
sampleSLGP: draws samples from the posterior predictive SLGP.
retrainSLGP: retrains a fitted SLGP object with new parameters or method.

Author(s)

Maintainer: Athénaïs Gautier athenais.gautier@onera.fr

References

Gautier, Athénaïs (2023). "Modelling and Predicting Distribution-Valued Fields with Applications to Inversion Under Uncertainty." Thesis, Universität Bern, Bern. See the thesis online at https://boristheses.unibe.ch/4377/

The SLGP S4 Class: Spatial Logistic Gaussian Process Model

Description

This S4 class represents a Spatial Logistic Gaussian Process (SLGP) model, designed for modeling conditional or spatially dependent probability distributions. It encapsulates all necessary components for training, sampling, and prediction, including the basis function setup, learned coefficients, and fitted hyperparameters.

Slots

formula

A formula specifying the model structure and covariates.

data

A data.frame containing the observations used to train the model.

responseName

A character string specifying the name of the response variable.

covariateName

A character vector specifying the names of the covariates.

responseRange

A numeric vector of length 2 indicating the lower and upper bounds of the response.

predictorsRange

A list containing:

predictorsLower: lower bounds of the covariates;
predictorsUpper: upper bounds of the covariates.

method

A character string indicating the training method used: one of {"MCMC", "MAP", "Laplace", "none"}.

p

An integer indicating the number of basis functions used.

basisFunctionsUsed

A character string specifying the type of basis functions used: "inducing points", "RFF", "Discrete FF", "filling FF", or "custom cosines".

opts_BasisFun

A list of additional options used to configure the basis functions.

BasisFunParam

A list containing the computed parameters of the basis functions, e.g., Fourier frequencies or interpolation weights.

coefficients

A matrix of coefficients for the finite-rank Gaussian process. Each row corresponds to a realization of the latent field: Z(x, t) = \sum_{i=1}^p \epsilon_i f_i(x, t) .

hyperparams

A list of hyperparameters, including:

sigma: numeric signal standard deviation;
lengthscale: a vector of lengthscales for each input dimension.

logPost

A numeric value representing the (unnormalized) log-posterior of the model. Currently available only for MAP and Laplace-trained models.

Check basis function parameters

Description

Checks and completes the parameter list for a given basis function type.

Usage

check_basisfun_opts(basisFunctionsUsed, dimension, opts_BasisFun = list())

Arguments

basisFunctionsUsed

Character. Type of basis function to use. One of: "inducing points", "RFF", "Discrete FF", "filling FF", "custom cosines".

dimension

Integer. The dimension of the input space (typically [\mathbf{x}, t]).

opts_BasisFun

List. Options specific to the chosen basis function. Users can refer to the documentation of specific basis function initialization functions (e.g., initialize_basisfun_inducingpt, initialize_basisfun_RFF, initialize_basisfun_fillingRFF, initialize_basisfun_discreteFF, etc.) for details on the available options.

Value

A completed list of options specific to the chosen basis function.

Computes the Euclidean distance between rows of two matrices

Description

Computes the Euclidean distance between rows of two matrices

Usage

crossdist(x, y)

Arguments

x

First matrix

y

Second matrix

Value

Euclidean distance between rows of x and y

Evaluate basis functions at given locations.

Description

Evaluates all basis functions defined by a parameter list at new locations.

Usage

evaluate_basis_functions(parameters, X, lengthscale)

Arguments

parameters

List of basis function parameters.

X

Matrix or dataframe of evaluation locations.

lengthscale

Numeric vector. Lengthscales used for scaling the input space.

Value

A matrix of basis function values.

Initialize basis function parameters

Description

Initializes the parameter list needed for a basis function.

Usage

initialize_basisfun(
  basisFunctionsUsed,
  dimension,
  lengthscale,
  opts_BasisFun = list()
)

Arguments

basisFunctionsUsed

Character. The type of basis function to use. One of: "inducing points", "RFF", "Discrete FF", "filling FF", "custom cosines".

dimension

Integer. Dimension of the input space [\mathbf{x},\,t].

lengthscale

Numeric vector. Lengthscales used for scaling the input space.

opts_BasisFun

List. Optional. Additional options specific to the chosen basis function. If the type is "custom cosines", the basis functions considered are coef\cos(freq^\top [x, t] + offset) and the user must provide three vectors: opts_BasisFun$freq, opts_BasisFun$offset and opts_BasisFun$coef. Users can refer to the documentation of specific basis function initialization functions (e.g., initialize_basisfun_inducingpt, initialize_basisfun_RFF, initialize_basisfun_fillingRFF, initialize_basisfun_discreteFF, etc.) for details on the available options.

Value

A list of initialized basis function parameters.

Initialize parameters basis functions based on Random Fourier Features

Description

Draws parameters for standard RFF approximating a Matérn kernel.

Usage

initialize_basisfun_RFF(dimension, nFreq, MatParam = 5/2, lengthscale)

Arguments

dimension

Integer. Input ([\mathbf{x},\,t]) dimension.

nFreq

Integer. Number of frequency vectors to be considered.

MatParam

Numeric. Matérn smoothness parameter (default = 5/2).

lengthscale

Numeric vector. Lengthscales used for scaling the input space.

Value

List with frequency, offset, and coefficient parameters.

Initialize discrete Fourier features

Description

Generates basis using discrete cosine/sine terms for each input dimension.

Usage

initialize_basisfun_discreteFF(dimension, maxOrdert, maxOrderx)

Arguments

dimension

Integer. Input ([\mathbf{x},\,t]) dimension.

maxOrdert

Integer. Maximum frequency in t.

maxOrderx

Integer. Maximum frequency in each x.

Value

List with frequency, offset, and coefficient parameters.

Initialize space-filling Random Fourier Features

Description

Initializes RFF parameters with LHS-optimized frequency directions.

Usage

initialize_basisfun_fillingRFF(
  dimension,
  nFreq,
  MatParam = 5/2,
  lengthscale,
  seed = 0
)

Arguments

dimension

Integer. Input ([\mathbf{x},\,t]) dimension.

nFreq

Integer. Number of frequency vectors to be considered.

MatParam

Numeric. Matérn smoothness parameter (default = 5/2).

lengthscale

Numeric vector. Lengthscales used for scaling the input space.

seed

Integer. Random seed.

Value

List with frequency, offset, and coefficient parameters.

Initialize parameters for inducing-point basis functions

Description

Computes kernel matrix and its decompositions for use in inducing-point basis functions.

Usage

initialize_basisfun_inducingpt(
  dimension,
  kernel = "Mat52",
  lengthscale,
  pointscoord = NULL,
  numberPoints = NULL
)

Arguments

dimension

Integer. Input ([\mathbf{x},\,t]) dimension.

kernel

Character. Kernel type ("Exp", "Mat32", "Mat52", "Gaussian").

lengthscale

Numeric vector. Lengthscales used for scaling the input space.

pointscoord

Optional matrix of inducing point coordinates. If none is provided, we sample them uniformly in the unit hypercube.

numberPoints

Integer. Number of inducing points (used if pointscoord is NULL).

Value

List with kernel square root and inverse root matrices, and scaled coordinates.

normalize_data: Normalize data to the range [0, 1]

Description

Scales the response and covariates of a dataset to the unit interval [0,1]. This normalization is required before applying SLGP methods. If range bounds are not provided, they are computed from the data.

Usage

normalize_data(
  data,
  predictorNames,
  responseName,
  predictorsUpper = NULL,
  predictorsLower = NULL,
  responseRange = NULL
)

Arguments

data

A data frame containing the dataset.

predictorNames

A character vector of covariate column names.

responseName

A character string specifying the response variable name.

predictorsUpper

Optional numeric vector of upper bounds for covariates.

predictorsLower

Optional numeric vector of lower bounds for covariates.

responseRange

Optional numeric vector of length 2 giving lower and upper bounds for the response.

Value

A normalized data frame with the same column structure as data, with values scaled to [0,1].

pre_comput_NN: Precompute quantities for SLGP basis evaluation with nearest-neighbor interpolation

Description

Computes intermediate quantities for evaluating SLGP basis functions using Nearest Neighbor (NN) interpolation over a regular grid in the normalized domain.

Usage

pre_comput_NN(
  normalizedData,
  predictorNames,
  responseName,
  nIntegral = 101,
  nDiscret = 51
)

Arguments

normalizedData

A normalized data frame (values in [0,1]).

predictorNames

Character vector of covariate names.

responseName

Name of the response variable.

nIntegral

Number of grid points for discretizing the response domain.

nDiscret

Number of grid points for discretizing the covariate domain.

Value

A list of intermediate quantities used in SLGP evaluation:

nodes: grid of response × covariates,
indNodesToIntegral: response bin indices,
indSamplesToNodes: sample-to-node index mapping,
weightSamplesToNodes: equal weights for NN interpolation.

pre_comput_WNN: Precompute quantities for SLGP basis evaluation with weighted nearest-neighbors

Description

Computes intermediate quantities for evaluating basis functions via weighted nearest-neighbor (WNN) interpolation on a discretized grid.

Usage

pre_comput_WNN(
  normalizedData,
  predictorNames,
  responseName,
  nIntegral = 101,
  nDiscret = 51
)

Arguments

normalizedData

Normalized data frame ([0,1]-scaled).

predictorNames

Character vector of covariate names.

responseName

Name of the response variable.

nIntegral

Number of quadrature points for response domain.

nDiscret

Number of discretization steps for covariates.

Value

A list of intermediate quantities:

nodes: all evaluation points in response × covariates grid,
indNodesToIntegral: indices to map nodes to response bins,
indSamplesToNodes: index mapping from samples to grid nodes,
weightSamplesToNodes: interpolation weights using inverse distance.

pre_comput_nothing: Precompute quantities for SLGP basis evaluation without interpolation

Description

Computes intermediate quantities for evaluating basis functions when no interpolation is used. Basis functions are evaluated at the exact covariate and response grid locations.

Usage

pre_comput_nothing(
  normalizedData,
  predictorNames,
  responseName,
  nIntegral = 51
)

Arguments

normalizedData

A data frame with values already normalized to [0,1].

predictorNames

Character vector of covariate column names.

responseName

Name of the response variable.

nIntegral

Integer, number of points used to discretize the response domain.

Value

A list of intermediate quantities used in SLGP basis function computation:

nodes: all points where basis functions are evaluated,
indNodesToIntegral: index mapping nodes to response bins,
indSamplesToNodes: index mapping observations to nodes,
indSamplesToPredictor: index mapping observations to unique predictors,
weightSamplesToNodes: interpolation weights (equal to 1 here).

Predict cumulative distribution values at new locations using a SLGP model

Description

Computes the posterior cumulative distribution function (CDF) values at specified covariate values using a fitted SLGP model.

Usage

predictSLGP_cdf(
  SLGPmodel,
  newNodes,
  interpolateBasisFun = "WNN",
  nIntegral = 101,
  nDiscret = 101
)

Arguments

SLGPmodel

An object of class SLGP-class.

newNodes

A data frame with covariate values where the SLGP should be evaluated.

interpolateBasisFun

Character string indicating the interpolation scheme for basis functions: one of "nothing", "NN", or "WNN" (default).

nIntegral

Number of integration points along the response axis.

nDiscret

Discretization resolution for interpolation (optional).

Value

A data frame with newNodes and predicted CDF values, columns named cdf_1, cdf_2, ...

Examples


# Load Boston housing dataset
library(MASS)
data("Boston")
# Set input and output ranges manually (you can also use range(Boston$age), etc.)
range_x <- c(0, 100)
range_response <- c(0, 50)

#' #Create a SLGP model but don't fit it
modelPrior <- slgp(medv ~ age,        # Use a formula to specify response and covariates
                 data = Boston,     # Use the original Boston housing data
                 method = "none",    # No training
                 basisFunctionsUsed = "RFF",         # Random Fourier Features
                 sigmaEstimationMethod = "heuristic",  # Auto-tune sigma2 (more stable)
                 predictorsLower = range_x[1],         # Lower bound for 'age'
                 predictorsUpper = range_x[2],         # Upper bound for 'age'
                 responseRange = range_response,       # Range for 'medv'
                 opts_BasisFun = list(nFreq = 200,     # Use 200 Fourier features
                                      MatParam = 5/2), # Matern 5/2 kernel
                 seed = 1)                             # Reproducibility

#Let us make 3 draws from the prior
nrep <- 3
set.seed(8)
p <- ncol(modelPrior@coefficients)
modelPrior@coefficients <- matrix(rnorm(n=nrep*p), nrow=nrep)

# Where to predict the field of pdfs ?
dfGrid <- data.frame(expand.grid(seq(range_x[1], range_x[2], 5),
seq(range_response[1], range_response[2],, 101)))
colnames(dfGrid) <- c("age", "medv")
predPriorcdf <- predictSLGP_cdf(SLGPmodel=modelPrior,
                                newNodes = dfGrid)

Predict centered or uncentered moments at new locations from a SLGP model

Description

Computes statistical moments (e.g., mean, variance, ...) of the posterior predictive distributions at new covariate locations, using a given SLGP model.

Usage

predictSLGP_moments(
  SLGPmodel,
  newNodes,
  power,
  centered = FALSE,
  interpolateBasisFun = "WNN",
  nIntegral = 101,
  nDiscret = 101
)

Arguments

SLGPmodel

An object of class SLGP-class.

newNodes

A data frame of new covariate values.

power

Scalar or vector of positive integers indicating the moment orders to compute.

centered

Logical; if TRUE, computes centered moments. If FALSE, computes raw moments.

interpolateBasisFun

Interpolation mode for basis functions: "nothing", "NN", or "WNN" (default).

nIntegral

Number of integration points for computing densities.

nDiscret

Discretization resolution of the response space.

Value

A data frame with:

Repeated rows of the input covariates,
A column power indicating the moment order,
One or more columns mSLGP_1, mSLGP_2, ... for the estimated moments across posterior samples.

Examples


# Load Boston housing dataset
library(MASS)
data("Boston")
# Set input and output ranges manually (you can also use range(Boston$age), etc.)
range_x <- c(0, 100)
range_response <- c(0, 50)

# Train an SLGP model using Laplace estimation and RFF basis
modelLaplace <- slgp(medv ~ age,        # Use a formula to specify response and covariates
                 data = Boston,     # Use the original Boston housing data
                 method = "Laplace",    # Train using Maximum A Posteriori estimation
                 basisFunctionsUsed = "RFF",         # Random Fourier Features
                 sigmaEstimationMethod = "heuristic",  # Auto-tune sigma2 (more stable)
                 predictorsLower = range_x[1],         # Lower bound for 'age'
                 predictorsUpper = range_x[2],         # Upper bound for 'age'
                 responseRange = range_response,       # Range for 'medv'
                 opts_BasisFun = list(nFreq = 200,     # Use 200 Fourier features
                                      MatParam = 5/2), # Matern 5/2 kernel
                 seed = 1)                             # Reproducibility
dfX <- data.frame(age=seq(range_x[1], range_x[2], 1))
predMean <- predictSLGP_moments(SLGPmodel=modelLaplace,
                                newNodes = dfX,
                                power=c(1, 2),
                                centered=FALSE) # Uncentered moments of order 1 and 2
predVar <- predictSLGP_moments(SLGPmodel=modelLaplace,
                               newNodes = dfX,
                               power=c(2),
                               centered=TRUE) # Centered moments of order 2 (Variance)

Predict densities at new covariate locations using a given SLGP model

Description

Computes the posterior predictive probability densities at new covariate points using a fitted Spatial Logistic Gaussian Process (SLGP) model.

Usage

predictSLGP_newNode(
  SLGPmodel,
  newNodes,
  interpolateBasisFun = "WNN",
  nIntegral = 101,
  nDiscret = 101
)

Arguments

SLGPmodel

An object of class SLGP-class.

newNodes

A data frame containing new covariate values at which to evaluate the SLGP.

interpolateBasisFun

Character string indicating how basis functions are evaluated: one of "nothing", "NN", or "WNN" (default).

nIntegral

Integer specifying the number of quadrature points over the response space.

nDiscret

Integer specifying the discretization step for interpolation (only used if applicable).

Value

A data frame combining newNodes with columns named pdf_1, pdf_2, ..., representing the posterior predictive density for each sample of the SLGP.

Examples


# Load Boston housing dataset
library(MASS)
data("Boston")
# Set input and output ranges manually (you can also use range(Boston$age), etc.)
range_x <- c(0, 100)
range_response <- c(0, 50)

#' #Create a SLGP model but don't fit it
modelPrior <- slgp(medv ~ age,        # Use a formula to specify response and covariates
                 data = Boston,     # Use the original Boston housing data
                 method = "none",    # No training
                 basisFunctionsUsed = "RFF",         # Random Fourier Features
                 sigmaEstimationMethod = "heuristic",  # Auto-tune sigma2 (more stable)
                 predictorsLower = range_x[1],         # Lower bound for 'age'
                 predictorsUpper = range_x[2],         # Upper bound for 'age'
                 responseRange = range_response,       # Range for 'medv'
                 opts_BasisFun = list(nFreq = 200,     # Use 200 Fourier features
                                      MatParam = 5/2), # Matern 5/2 kernel
                 seed = 1)                             # Reproducibility

#Let us make 3 draws from the prior
nrep <- 3
set.seed(8)
p <- ncol(modelPrior@coefficients)
modelPrior@coefficients <- matrix(rnorm(n=nrep*p), nrow=nrep)

# Where to predict the field of pdfs ?
dfGrid <- data.frame(expand.grid(seq(range_x[1], range_x[2], 5),
seq(range_response[1], range_response[2],, 101)))
colnames(dfGrid) <- c("age", "medv")
predPrior <- predictSLGP_newNode(SLGPmodel=modelPrior,
                                 newNodes = dfGrid)

Predict quantiles from a SLGP model at new locations

Description

Computes quantile values at specified levels (probs) for new covariate points, based on the posterior CDFs from a trained SLGP model.

Usage

predictSLGP_quantiles(
  SLGPmodel,
  newNodes,
  probs,
  interpolateBasisFun = "WNN",
  nIntegral = 101,
  nDiscret = 101
)

Arguments

SLGPmodel

An object of class SLGP-class.

newNodes

A data frame of covariate values.

probs

Numeric vector of quantile levels to compute (e.g., 0.1, 0.5, 0.9).

interpolateBasisFun

Character string specifying interpolation scheme: "nothing", "NN", or "WNN" (default).

nIntegral

Number of integration points for computing the SLGP outputs.

nDiscret

Discretization level of the response axis (for CDF inversion).

Value

A data frame with columns:

The covariates in newNodes (repeated per quantile level),
A column probs indicating the quantile level,
Columns qSLGP_1, qSLGP_2, ... for each posterior sample's quantile estimate.

Examples


# Load Boston housing dataset
library(MASS)
data("Boston")
# Set input and output ranges manually (you can also use range(Boston$age), etc.)
range_x <- c(0, 100)
range_response <- c(0, 50)

# Train an SLGP model using Laplace estimation and RFF basis
modelLaplace <- slgp(medv ~ age,        # Use a formula to specify response and covariates
                 data = Boston,     # Use the original Boston housing data
                 method = "Laplace",    # Train using Maximum A Posteriori estimation
                 basisFunctionsUsed = "RFF",         # Random Fourier Features
                 sigmaEstimationMethod = "heuristic",  # Auto-tune sigma2 (more stable)
                 predictorsLower = range_x[1],         # Lower bound for 'age'
                 predictorsUpper = range_x[2],         # Upper bound for 'age'
                 responseRange = range_response,       # Range for 'medv'
                 opts_BasisFun = list(nFreq = 200,     # Use 200 Fourier features
                                      MatParam = 5/2), # Matern 5/2 kernel
                 seed = 1)                             # Reproducibility
dfX <- data.frame(age=seq(range_x[1], range_x[2], 1))
# Predict some quantiles, for instance here the first quartile, median, third quartile
predQuartiles <- predictSLGP_quantiles(SLGPmodel= modelLaplace,
                                       newNodes = dfX,
                                       probs=c(0.25, 0.50, 0.75))

Retrain a fitted SLGP model with new data and/or estimation method

Description

This function retrains an existing SLGP model using either a Bayesian MCMC estimation, a Maximum A Posteriori (MAP) estimation, or a Laplace approximation. The model can be retrained using new data, new inference settings, or updated hyperparameters. It reuses the structure and basis functions from the original model.

Usage

retrainSLGP(
  SLGPmodel,
  newdata = NULL,
  epsilonStart = NULL,
  method,
  interpolateBasisFun = "WNN",
  nIntegral = 51,
  nDiscret = 51,
  hyperparams = NULL,
  sigmaEstimationMethod = "none",
  seed = NULL,
  opts = list(),
  verbose = FALSE
)

Arguments

SLGPmodel

An object of class SLGP-class to be retrained.

newdata

Optional data frame containing new observations. If NULL, the original data is reused.

epsilonStart

Optional numeric vector with initial values for the coefficients \epsilon.

method

Character string specifying the estimation method: one of {"MCMC", "MAP", "Laplace"}.

interpolateBasisFun

Character string specifying how basis functions are evaluated:

"nothing" — evaluate directly at sample locations;
"NN" — interpolate using nearest neighbor;
"WNN" — interpolate using weighted nearest neighbors (default).

nIntegral

Integer specifying the number of quadrature points used to approximate integrals over the response domain.

nDiscret

Integer specifying the discretization grid size (used only if interpolation is enabled).

hyperparams

Optional list with updated hyperparameters. Must include:

sigma2: signal variance;
lengthscale: vector of lengthscales for the inputs.

sigmaEstimationMethod

Character string indicating how to estimate sigma2: either "none" (default) or "heuristic".

seed

Optional integer to set the random seed for reproducibility.

opts

Optional list of additional options passed to inference routines: stan_chains, stan_iter, ndraws, etc.

verbose

Logical; if TRUE, print progress and diagnostic messages during computation. Defaults to FALSE.

Value

An updated object of class SLGP-class with retrained coefficients and updated posterior information.

References

Gautier, A. (2023). Modelling and Predicting Distribution-Valued Fields with Applications to Inversion Under Uncertainty. PhD Thesis, Universität Bern. https://boristheses.unibe.ch/4377/

Examples


# Load Boston housing dataset
library(MASS)
data("Boston")
range_x <- c(0, 100)
range_response <- c(0, 50)

#Create a SLGP model but don't fit it
modelPrior <- slgp(medv ~ age,        # Use a formula to specify response and covariates
                 data = Boston,     # Use the original Boston housing data
                 method = "none",    # No training
                 basisFunctionsUsed = "RFF",         # Random Fourier Features
                 sigmaEstimationMethod = "heuristic",  # Auto-tune sigma2 (more stable)
                 predictorsLower = range_x[1],         # Lower bound for 'age'
                 predictorsUpper = range_x[2],         # Upper bound for 'age'
                 responseRange = range_response,       # Range for 'medv'
                 opts_BasisFun = list(nFreq = 200,     # Use 200 Fourier features
                                      MatParam = 5/2), # Matern 5/2 kernel
                 seed = 1)                             # Reproducibility
#Retrain using the Boston Housing dataset and a Laplace approximation scheme
modelLaplace <- retrainSLGP(SLGPmodel=modelPrior,
                            newdata = Boston,
                            method="Laplace")

Rosenblatt transform to multivariate Student distribution

Description

Auxiliary function that maps uniform samples in [0, 1]^d to samples from the spectral density of a Matérn kernel (i.e., a multivariate Student distribution).

Usage

rosenblatt_transform_multivarStudent(x, dimension, MatParam = 5/2)

Arguments

x

A matrix (or vector) of samples in [0, 1]^d to transform.

dimension

Integer. The dimension of the input space.

MatParam

Numeric. The Matérn kernel smoothness parameter (default = 5/2).

Value

A matrix with transformed coordinates following a multivariate Student distribution.

Draw posterior predictive samples from a SLGP model

Description

Samples from the predictive distributions modeled by a SLGP at new covariate inputs. This method uses inverse transform sampling on the estimated posterior CDFs.

Usage

sampleSLGP(
  SLGPmodel,
  newX,
  n,
  interpolateBasisFun = "WNN",
  nIntegral = 101,
  nDiscret = 101,
  seed = NULL
)

Arguments

SLGPmodel

A trained SLGP model object (SLGP-class).

newX

A data frame of new covariate values at which to draw samples.

n

Integer or integer vector specifying how many samples to draw at each input point.

interpolateBasisFun

Character string specifying interpolation scheme for basis evaluation. One of "nothing", "NN", or "WNN" (default).

nIntegral

Integer; number of quadrature points for density approximation.

nDiscret

Integer; discretization step for the response axis.

seed

Optional integer to set a random seed for reproducibility.

Value

A data frame containing sampled responses from the SLGP model, with covariate columns from newX and one response column named after SLGPmodel@responseName.

Examples


# Load Boston housing dataset
library(MASS)
data("Boston")
# Set input and output ranges manually (you can also use range(Boston$age), etc.)
range_x <- c(0, 100)
range_response <- c(0, 50)

# Train an SLGP model using Laplace estimation and RFF basis
modelMAP <- slgp(medv ~ age,        # Use a formula to specify response and covariates
                 data = Boston,     # Use the original Boston housing data
                 method = "MAP",    # Train using Maximum A Posteriori estimation
                 basisFunctionsUsed = "RFF",         # Random Fourier Features
                 sigmaEstimationMethod = "heuristic",  # Auto-tune sigma2 (more stable)
                 predictorsLower = range_x[1],         # Lower bound for 'age'
                 predictorsUpper = range_x[2],         # Upper bound for 'age'
                 responseRange = range_response,       # Range for 'medv'
                 opts_BasisFun = list(nFreq = 200,     # Use 200 Fourier features
                                      MatParam = 5/2), # Matern 5/2 kernel
                 seed = 1)                             # Reproducibility

# Let's draw new sample points from the SLGP

newDataPoints <- sampleSLGP(modelMAP,
                            newX = data.frame(age=c(0, 25, 95)),
                            n = c(10, 1000, 1), # how many samples to draw at each new x
                            interpolateBasisFun = "WNN")

Define and can train a Spatial Logistic Gaussian Process (SLGP) model

Description

This function builds and trains an SLGP model based on a specified formula and data. The SLGP is a finite-rank Gaussian process model for conditional density estimation, trained using MAP, MCMC, Laplace approximation, or left untrained ("none").

Usage

slgp(
  formula,
  data,
  epsilonStart = NULL,
  method,
  basisFunctionsUsed,
  interpolateBasisFun = "NN",
  nIntegral = 51,
  nDiscret = 51,
  hyperparams = NULL,
  predictorsUpper = NULL,
  predictorsLower = NULL,
  responseRange = NULL,
  sigmaEstimationMethod = "none",
  seed = NULL,
  opts_BasisFun = list(),
  BasisFunParam = NULL,
  opts = list(),
  verbose = FALSE
)

Arguments

formula

A formula specifying the model structure, with the response on the left-hand side and covariates on the right.

data

A data frame containing the variables used in the formula.

epsilonStart

Optional numeric vector of initial weights for the finite-rank GP: Z(x,t) = \sum_{i=1}^p \epsilon_i f_i(x, t).

method

Character string specifying the training method: one of {"none", "MCMC", "MAP", "Laplace"}.

basisFunctionsUsed

Character string describing the basis function type: one of "inducing points", "RFF", "Discrete FF", "filling FF", or "custom cosines".

interpolateBasisFun

Character string indicating how to evaluate basis functions: "nothing" (exact eval), "NN" (nearest-neighbor), or "WNN" (weighted inverse-distance). Default is "NN".

nIntegral

Number of quadrature points used for numerical integration over the response domain.

nDiscret

Integer controlling the resolution of the interpolation grid (used only for "NN" or "WNN").

hyperparams

Optional list of hyperparameters. Should contain:

sigma2: signal variance
lengthscale: vector of lengthscales (one per covariate)

predictorsUpper

Optional numeric vector for the upper bounds of the covariates (used for scaling).

predictorsLower

Optional numeric vector for the lower bounds of the covariates.

responseRange

Optional numeric vector of length 2 with the lower and upper bounds of the response.

sigmaEstimationMethod

Method to heuristically estimate the variance sigma2. Either "none" (default) or "heuristic".

seed

Optional integer for reproducibility.

opts_BasisFun

List of optional configuration parameters passed to the basis function initializer.

BasisFunParam

Optional list of precomputed basis function parameters.

opts

Optional list of extra settings passed to inference routines (e.g., stan_iter, stan_chains, ndraws).

verbose

Logical; if TRUE, print progress and diagnostic messages during computation. Defaults to FALSE.

Value

An object of S4 class SLGP-class, containing:

coefficients: Matrix of posterior (or prior) draws of the SLGP coefficients \epsilon_i.
hyperparams: List of fitted or provided hyperparameters.
logPost: Log-posterior (if MAP or Laplace used).
method: Estimation method used.
...: Other internal information such as ranges, basis settings, and data.

References

Gautier, Athénaïs (2023). "Modelling and Predicting Distribution-Valued Fields with Applications to Inversion Under Uncertainty." Thesis, Universität Bern, Bern. https://boristheses.unibe.ch/4377/

Examples


# Load Boston housing dataset
library(MASS)
data("Boston")
# Set input and output ranges manually (you can also use range(Boston$age), etc.)
range_x <- c(0, 100)
range_response <- c(0, 50)

#' #Create a SLGP model but don't fit it
modelPrior <- slgp(medv ~ age,        # Use a formula to specify response and covariates
                 data = Boston,     # Use the original Boston housing data
                 method = "none",    # No training
                 basisFunctionsUsed = "RFF",         # Random Fourier Features
                 sigmaEstimationMethod = "heuristic",  # Auto-tune sigma2 (more stable)
                 predictorsLower = range_x[1],         # Lower bound for 'age'
                 predictorsUpper = range_x[2],         # Upper bound for 'age'
                 responseRange = range_response,       # Range for 'medv'
                 opts_BasisFun = list(nFreq = 200,     # Use 200 Fourier features
                                      MatParam = 5/2), # Matern 5/2 kernel
                 seed = 1)                             # Reproducibility

# Train an SLGP model using MAP estimation and RFF basis
modelMAP <- slgp(medv ~ age,        # Use a formula to specify response and covariates
                 data = Boston,     # Use the original Boston housing data
                 method = "MAP",    # Train using Maximum A Posteriori estimation
                 basisFunctionsUsed = "RFF",         # Random Fourier Features
                 sigmaEstimationMethod = "heuristic",  # Auto-tune sigma2 (more stable)
                 predictorsLower = range_x[1],         # Lower bound for 'age'
                 predictorsUpper = range_x[2],         # Upper bound for 'age'
                 responseRange = range_response,       # Range for 'medv'
                 opts_BasisFun = list(nFreq = 200,     # Use 200 Fourier features
                                      MatParam = 5/2),  # Matern 5/2 kernel
                 seed = 1)                             # Reproducibility