resemble
Similarity Retrieval and Local Learning for Spectral Chemometrics
Last update: 2026-04-20
Version: 3.0.0 – vortex
Think Globally, Fit Locally (Saul and Roweis, 2003)
The resemble package provides computationally efficient
methods for dissimilarity analysis and predictive modelling with complex
spectral data. Its core functionality includes memory-based learning
(MBL), evolutionary subset search and selection, and retrieval-based
modelling using pre-computed model libraries. The package is designed to
support local modelling, spectral library optimization, and model-based
prediction in large and heterogeneous spectral data sets.
The package includes comprehensive vignettes covering all major functionality:
ortho_projection()search_neighbors()model()mbl()gesearch()liblex()Version 3.0 is a major release with a redesigned API, new modelling functions, and improved computational efficiency.
New modelling functions:
liblex(): Builds a library of reusable localized
models (experts) that can be stored and reused for prediction without
refitting. Based on Ramirez-Lopez et al. (2026b).
gesearch(): Evolutionary algorithm for selecting
optimal subsets from large spectral libraries to build context-specific
calibrations. Based on Ramirez-Lopez et al. (2026a).
model(): Fits global PLS or GPR calibration models
with cross-validation.
Redesigned dissimilarity interface:
The dissimilarity system now uses constructor functions:
diss_pca(), diss_pls(): Mahalanobis
distance in projection spacediss_correlation(): Correlation-based dissimilarity
(including moving window)diss_euclidean(), diss_mahalanobis(),
diss_cosine(): Distance metricsComponent selection via ncomp_by_var(),
ncomp_by_cumvar(), ncomp_by_opc(), or
ncomp_fixed().
Redesigned neighbor and fitting interfaces:
neighbors_k(), neighbors_diss(): Neighbor
selection constructorsfit_pls(), fit_wapls(),
fit_gpr(): Local fitting constructors (replace
local_fit_*() functions)Breaking changes in mbl():
k, k_diss, k_range replaced
by neighbors argumentmethod renamed to fit_methodcenter and scale removed; now controlled
within constructorsSee NEWS.md for full details on deprecated and removed
functions.
Dimensionality reduction:
ortho_projection(): PCA or PLS projection with multiple
algorithms (SVD, NIPALS, SIMPLS)Computing dissimilarity matrices:
dissimilarity(): Main interface for dissimilarity
computationdiss_pca(), diss_pls(),
diss_correlation(), diss_euclidean(),
diss_mahalanobis(), diss_cosine(): Method
constructorsdiss_evaluate(): Evaluate dissimilarity matrices using
side informationNeighbor search:
search_neighbors(): Efficient k-nearest neighbor
retrievalModelling spectral data:
model(): Global PLS or GPR calibrationmbl(): Memory-based learning for per-query local
modellinggesearch(): Evolutionary subset selection for
domain-adaptive calibrationliblex(): Pre-computed library of local experts for
fast predictionInstall from CRAN:
install.packages("resemble")Or install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("l-ramirez-lopez/resemble")The package requires a C++ compiler. On Windows, install Rtools. On
macOS, you may need to install gfortran and
clang from CRAN tools.
mbl()library(resemble)
library(prospectr)
data(NIRsoil)
# Preprocess spectra
NIRsoil$spc_pr <- savitzkyGolay(
detrend(NIRsoil$spc, wav = as.numeric(colnames(NIRsoil$spc))),
m = 1, p = 1, w = 7
)
# Split into training and test sets
train_x <- NIRsoil$spc_pr[NIRsoil$train == 1 & !is.na(NIRsoil$CEC), ]
train_y <- NIRsoil$CEC[NIRsoil$train == 1 & !is.na(NIRsoil$CEC)]
test_x <- NIRsoil$spc_pr[NIRsoil$train == 0 & !is.na(NIRsoil$CEC), ]
test_y <- NIRsoil$CEC[NIRsoil$train == 0 & !is.na(NIRsoil$CEC)]
# Memory-based learning with Gaussian process regression
sbl <- mbl(
Xr = train_x,
Yr = train_y,
Xu = test_x,
neighbors = neighbors_k(seq(50, 130, by = 20)),
diss_method = diss_pca(ncomp = ncomp_by_opc(40)),
fit_method = fit_gpr(),
control = mbl_control(validation_type = "NNv")
)
sbl
plot(sbl)
get_predictions(sbl)
liblex()liblex() builds a library of local experts that can be
reused for prediction without refitting:
# Build model library
model_lib <- liblex(
Xr = train_x,
Yr = train_y,
neighbors = neighbors_k(c(40, 60, 80)),
diss_method = diss_correlation(ws = 27, scale = TRUE),
fit_method = fit_wapls(min_ncomp = 3, max_ncomp = 15, method = "mpls"),
control = liblex_control(tune = TRUE)
)
# Predict new observations
predictions <- predict(model_lib, test_x)gesearch()gesearch() selects optimal subsets from large spectral
libraries:
# Search for optimal calibration subset
gs <- gesearch(
Xr = train_x,
Yr = train_y,
Xu = test_x,
k = 50,
b = 100,
retain = 0.97,
target_size = 200,
fit_method = fit_pls(ncomp = 15, method = "mpls"),
optimization = c("reconstruction", "similarity"),
seed = 42
)
# Predict using selected subset
preds <- predict(gs, test_x)
plot(gs)Memory-based learning (MBL, a.k.a. instance-based learning or local modelling) is a non-linear lazy learning approach. For each prediction, the algorithm:
The mbl() function offers three regression methods for
local models:
fit_gpr(): Gaussian process regression with linear
kernelfit_pls(): Partial least squaresfit_wapls(): Weighted average PLS (Shenk et al.,
1997)
citation(package = "resemble")resemble2026.05: van Leeuwen et
al., 2026
used resemble for principal component Mahalanobis
nearest-neighbour search to extract spectrally similar samples from the
KSSL library for MIR model calibration in Dutch soils.
2026.04: Irving et al.,
2026
used resemble in modelling workflows for infrared
spectroscopy prediction of soil microbial properties across Australian
soils.
2026.03: Shrestha et al.,
2026
used resemble in a hybrid localisation workflow to predict
farm-scale soil cadmium from a regional spectral library; LOCAL models
with MIR data performed best.
2025.10: Summerauer et al.,
2025 used resemble for MBL modelling of soil properties
from infrared spectra across tropical hillslopes in Eastern
Africa.
2025.05: Sun and Shi, 2025 combined spectral and geographical similarity for SOC prediction; local PLSR outperformed global models.
2025.03: Breure et al.,
2025
used resemble for local VNIR modelling of soil carbon
fractions (POC and MAOC) across European agricultural soils, published
in Nature Communications.
2025.03: Purushothaman et al., 2025 applied MBL to AVIRIS-NG hyperspectral data for soil property prediction in India.
2025.01: Dai et al., 2025 used MBL for POC and MAOC prediction from VNIR in Guangdong.
2024.12: Asrat et al., 2024 MBL for local calibration sample selection in the Moroccan Soil Spectral Library.
2024.09: Barbetti et al., 2024 MBL to detect SOC changes in long-term experiments using vis–NIR.
2023.11: Wang et al., 2023 N-MBL (MBL + RF within local fitting) for regional vis–NIR models.
2022: Sanderman et al., 2022 evaluated transferability of large MIR spectral databases across instruments.
2022.01: Ng et al., 2022 showed that MBL yields better local SOC predictions than spiking approaches.
2021.10: Ramirez-Lopez et al., 2021 MBL to predict soil properties in Africa.
2020.08: Charlotte Rivard’s MIR MBL tutorial: https://whrc.github.io/Soil-Predictions-MIR/
2020.01: Sanderman et al., 2020 MIR spectroscopy for prediction of soil health indicators; MBL and Cubist excelled.
2019.03: Ramirez-Lopez et al., 2019 MBL in digital soil mapping at farm scale.
2019.03: Jaconi et al., 2019 MBL for national-scale NIR texture predictions in Germany.
2018.01: Dotto et al., 2018 MBL for SOC prediction in Brazil.
2016.04: Viscarra Rossel et al., 2016 memory-based learning for soil property prediction.
2014.03: First CRAN release of
resemble.
prospectr:
Signal processing and chemometrics for spectroscopyContributions are welcome! Please read our Contributing Guidelines (available in the GitHub repo) before submitting pull requests.
This project follows a Code of Conduct available in the GitHub repo.
Report issues at GitHub or contact the maintainer (ramirez.lopez.leo@gmail.com).
Lobsey, C. R., Viscarra Rossel, R. A., Roudier, P., & Hedley, C. B. 2017. rs-local data-mines information from spectral libraries to improve local calibrations. European Journal of Soil Science, 68(6), 840-852.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Viscarra Rossel, R., Behrens, T., Orellano, C., Perez-Fernandez, E., Kooijman, L., Wadoux, A. M. J.-C., Breure, T., Summerauer, L., Safanelli, J. L., & Plans, M. (2026a). When spectral libraries are too complex to search: Evolutionary subset selection for domain-adaptive calibration. Analytica Chimica Acta, under review.
Ramirez-Lopez, L., Metz, M., Lesnoff, M., Orellano, C., Perez-Fernandez, E., Plans, M., Breure, T., Behrens, T., Viscarra Rossel, R., & Peng, Y. (2026b). Rethinking local spectral modelling: From per-query refitting to model libraries. Analytica Chimica Acta, under review.
Saul, L. K., & Roweis, S. T. 2003. Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of machine learning research, 4(Jun), 119-155.
Shenk, J., Westerhaus, M., and Berzaghi, P. 1997. Investigation of a LOCAL calibration procedure for near infrared instruments. Journal of Near Infrared Spectroscopy, 5, 223-232.