resemble Similarity Retrieval and Local Learning for Spectral Chemometrics

R-CMD-check codecov CRAN status Downloads

Last update: 2026-04-20

Version: 3.0.0 – vortex

Think Globally, Fit Locally (Saul and Roweis, 2003)

About

The resemble package provides computationally efficient methods for dissimilarity analysis and predictive modelling with complex spectral data. Its core functionality includes memory-based learning (MBL), evolutionary subset search and selection, and retrieval-based modelling using pre-computed model libraries. The package is designed to support local modelling, spectral library optimization, and model-based prediction in large and heterogeneous spectral data sets.

Documentation

The package includes comprehensive vignettes covering all major functionality:

  1. Essential concepts and setup: Introduction, data preparation, and notation
  2. Dimensionality reduction: PCA and PLS projections with ortho_projection()
  3. Estimating dissimilarity between spectra: Dissimilarity methods and evaluation
  4. Nearest neighbor search: Finding similar spectra with search_neighbors()
  5. Simple global models: Global calibration with model()
  6. Classical memory-based learning: Per-query local modelling with mbl()
  7. Evolutionary subset search: Domain-adaptive calibration with gesearch()
  8. Building a library of models: Pre-computed experts with liblex()

What’s new in version 3.0

Version 3.0 is a major release with a redesigned API, new modelling functions, and improved computational efficiency.

New modelling functions:

Redesigned dissimilarity interface:

The dissimilarity system now uses constructor functions:

Component selection via ncomp_by_var(), ncomp_by_cumvar(), ncomp_by_opc(), or ncomp_fixed().

Redesigned neighbor and fitting interfaces:

Breaking changes in mbl():

See NEWS.md for full details on deprecated and removed functions.

Core functionality

Dimensionality reduction:

Computing dissimilarity matrices:

Neighbor search:

Modelling spectral data:

Installation

Install from CRAN:

install.packages("resemble")

Or install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("l-ramirez-lopez/resemble")

The package requires a C++ compiler. On Windows, install Rtools. On macOS, you may need to install gfortran and clang from CRAN tools.

Example: Memory-based learning with mbl()

library(resemble)
library(prospectr)
data(NIRsoil)

# Preprocess spectra
NIRsoil$spc_pr <- savitzkyGolay(
 detrend(NIRsoil$spc, wav = as.numeric(colnames(NIRsoil$spc))),
 m = 1, p = 1, w = 7
)

# Split into training and test sets
train_x <- NIRsoil$spc_pr[NIRsoil$train == 1 & !is.na(NIRsoil$CEC), ]
train_y <- NIRsoil$CEC[NIRsoil$train == 1 & !is.na(NIRsoil$CEC)]
test_x <- NIRsoil$spc_pr[NIRsoil$train == 0 & !is.na(NIRsoil$CEC), ]
test_y <- NIRsoil$CEC[NIRsoil$train == 0 & !is.na(NIRsoil$CEC)]

# Memory-based learning with Gaussian process regression
sbl <- mbl(
 Xr = train_x,
 Yr = train_y,
 Xu = test_x,
 neighbors = neighbors_k(seq(50, 130, by = 20)),
 diss_method = diss_pca(ncomp = ncomp_by_opc(40)),
 fit_method = fit_gpr(),
 control = mbl_control(validation_type = "NNv")
)
sbl
plot(sbl)
get_predictions(sbl)

Example: Pre-computed model library with liblex()

liblex() builds a library of local experts that can be reused for prediction without refitting:

# Build model library
model_lib <- liblex(
 Xr = train_x,
 Yr = train_y,
 neighbors = neighbors_k(c(40, 60, 80)),
 diss_method = diss_correlation(ws = 27, scale = TRUE),
 fit_method = fit_wapls(min_ncomp = 3, max_ncomp = 15, method = "mpls"),
 control = liblex_control(tune = TRUE)
)

# Predict new observations
predictions <- predict(model_lib, test_x)

Example: Evolutionary subset selection with gesearch()

gesearch() selects optimal subsets from large spectral libraries:

# Search for optimal calibration subset
gs <- gesearch(
 Xr = train_x, 
 Yr = train_y,
 Xu = test_x,
 k = 50, 
 b = 100, 
 retain = 0.97,
 target_size = 200,
 fit_method = fit_pls(ncomp = 15, method = "mpls"),
 optimization = c("reconstruction", "similarity"),
 seed = 42
)

# Predict using selected subset
preds <- predict(gs, test_x)
plot(gs)

Memory-based learning overview

Memory-based learning (MBL, a.k.a. instance-based learning or local modelling) is a non-linear lazy learning approach. For each prediction, the algorithm:

  1. Finds the k-nearest neighbors in the reference set
  2. Fits a local model using those neighbors
  3. Predicts the response for the target observation

The mbl() function offers three regression methods for local models:

Citing the package

citation(package = "resemble")

News: Memory-based learning and resemble

Contributing

Contributions are welcome! Please read our Contributing Guidelines (available in the GitHub repo) before submitting pull requests.

This project follows a Code of Conduct available in the GitHub repo.

Bug reports

Report issues at GitHub or contact the maintainer (ramirez.lopez.leo@gmail.com).

References

Lobsey, C. R., Viscarra Rossel, R. A., Roudier, P., & Hedley, C. B. 2017. rs-local data-mines information from spectral libraries to improve local calibrations. European Journal of Soil Science, 68(6), 840-852.

Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.

Ramirez-Lopez, L., Viscarra Rossel, R., Behrens, T., Orellano, C., Perez-Fernandez, E., Kooijman, L., Wadoux, A. M. J.-C., Breure, T., Summerauer, L., Safanelli, J. L., & Plans, M. (2026a). When spectral libraries are too complex to search: Evolutionary subset selection for domain-adaptive calibration. Analytica Chimica Acta, under review.

Ramirez-Lopez, L., Metz, M., Lesnoff, M., Orellano, C., Perez-Fernandez, E., Plans, M., Breure, T., Behrens, T., Viscarra Rossel, R., & Peng, Y. (2026b). Rethinking local spectral modelling: From per-query refitting to model libraries. Analytica Chimica Acta, under review.

Saul, L. K., & Roweis, S. T. 2003. Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of machine learning research, 4(Jun), 119-155.

Shenk, J., Westerhaus, M., and Berzaghi, P. 1997. Investigation of a LOCAL calibration procedure for near infrared instruments. Journal of Near Infrared Spectroscopy, 5, 223-232.