Classical memory-based learning (local modelling)

Leonardo Ramirez-Lopez;

Alexandre M.J.-C. Wadoux

2026-04-20

Think Globally, Fit Locally(Saul and Roweis, 2003)

1 Memory-based learning

Memory-based learning (MBL) describes a family of local learning methods that are well suited to complex and heterogeneous spectral datasets (Ramirez-Lopez et al., 2013). In MBL, instead of fitting a single global regression function, a local regression model is fitted for each target observation using its nearest neighbors identified in a calibration or reference set. Although the global relationship between \(X\) and \(Y\) may be complex, MBL approximates it through a collection of simpler local models, each assumed to be valid within a restricted region of the predictor space (Mitchell, 1997).

For a set of \(m\) observations requiring prediction, this can be expressed as

\[ \hat{y}_i = \hat{f}_i(x_i; \theta_i), \qquad i = 1, \ldots, m \]

where \(\hat{f}_i\) denotes the local prediction function fitted for observation \(i\), and \(\theta_i\) represents the corresponding model parameters, such as the number of latent variables in a local PLS model.

In this sense, MBL can be viewed as a collection of local prediction functions,

\[ \hat{f} = \{\hat{f}_1, \ldots, \hat{f}_m\}, \]

constructed on demand for the observations to be predicted. The Figure 1 illustrates the basic steps in MBL for a set of five observations (\(m = 5\)).

Figure 1: Example of the main steps in memory-based learning for predicting a response variable for five observations in a \(p\)-dimensional space.

There are four basic elements that must be specified in any MBL algorithm:

  1. A dissimilarity measure: This is required for neighbor search. Ideally, the dissimilarity measure should also reflect differences that are relevant to the response variable for which local models are to be built. For example, in soil NIR spectroscopy, spectral dissimilarities between soil samples should reflect, at least to some extent, their compositional differences. If the dissimilarity measure fails to capture this relevant notion of similarity, the resulting MBL models are more likely to show poor predictive performance.

  2. How many neighbors to use: The neighborhood size used to fit the local models must be chosen carefully. Very small neighborhoods may be overly sensitive to noise and outliers, which can reduce model robustness (Ramirez-Lopez et al., 2013). They may also contain insufficient variability to adequately capture the relationship between predictors and response. In contrast, overly large neighborhoods may include observations from more heterogeneous regions of the predictor space, which can introduce more complex relationships and reduce local model accuracy.

  3. How to use the dissimilarity information: Dissimilarity information can be used in different ways:

\[ \begin{bmatrix} 0_{1,1} & d_{1,2} & \cdots & d_{1,k} & x_{1,1} & x_{1,2} & \cdots & x_{1,p} \\ d_{2,1} & 0_{2,2} & \cdots & d_{2,k} & x_{2,1} & x_{2,2} & \cdots & x_{2,p} \\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ d_{k,1} & d_{k,2} & \cdots & 0_{k,k} & x_{k,1} & x_{k,2} & \cdots & x_{k,p} \end{bmatrix} \]

where \(d_{i,j}\) represents the dissimilarity between the \(i\)th and \(j\)th neighbors.

  1. How to fit the local models: This is determined by the regression method used within each neighborhood. In many spectroscopic applications, these local models are linear, because the relationship between predictors and response is often assumed to be approximately linear within a restricted local region.

In the literature, MBL is often referred to as local modeling. However, local modeling also includes other strategies, such as cluster-based modeling or geographically segmented modeling. In this sense, MBL can be regarded as one particular type of local modeling (Ramirez-Lopez et al., 2013).

The mbl() function in the resemble package provides a flexible framework for building customized memory-based learners. It allows the user to choose among different dissimilarity measures, neighborhood-selection methods, ways of using dissimilarity information, and regression methods for fitting the local models.

Readers are encouraged to review the sections on dissimilarity measures and k-nearest neighbor search first, as they provide the basis for the examples presented in this section.

The mbl() function can be described in terms of the four basic elements introduced above:

  1. The dissimilarity measure: This is controlled by the diss_method argument of mbl(). The available methods are the same as those described in the dissimilarity section and are specified through constructor functions such as diss_pca(), diss_pls(), and diss_correlation().

  2. How many neighbors to use: This is controlled by the neighbors argument, which accepts either neighbors_k() for fixed neighborhood sizes or neighbors_diss() for threshold-based selection. Multiple neighborhood sizes can be passed to neighbors_k(), and multiple dissimilarity thresholds can be passed to neighbors_diss(), allowing several settings to be evaluated in a single run. For neighbors_diss(), the k_min and k_max arguments define the minimum and maximum allowed neighborhood sizes.

  3. How to use the dissimilarity information: This is controlled by the diss_usage argument. If "none" is used, the dissimilarity information is used only for neighbor retrieval. If "weights" is used, the neighbors are weighted according to their dissimilarity to the target observation using a tricubic weighting function. If "predictors" is used, the dissimilarity information is incorporated as an additional set of predictors.

  4. How to fit the local models: This is controlled by the fit_method argument, which takes a fitting-method object specifying the regression method and its parameters. Three methods are currently available: partial least squares (PLS) regression, weighted average partial least squares regression (WAPLS) (Shenk et al., 1997), and Gaussian process regression (GPR) with a dot-product covariance function.

2 Fitting methods and constructors

The regressions available for fitting the local models in mbl() are specified through fitting-method objects. These objects are created using constructor functions such as fit_pls(), fit_wapls(), and fit_gpr(). Each of these functions accepts specific arguments that control the parameters of the corresponding regression method. For example, fit_pls() accepts the ncomp argument to specify the number of latent variables in the PLS model, while fit_wapls() accepts min_ncomp and max_ncomp to define a range of latent variable numbers for optimization. The following examples illustrate how to construct these fitting-method objects:

# Creates an object with instructions to build PLS models
my_pls <- fit_pls(ncomp = 15)
my_pls
Fitting method: pls
   ncomp    : 15 
   method   : pls 
   scale    : FALSE 
   max_iter : 100 
   tol      : 1e-06 
# Creates an object with instructions to build WAPLS models
my_wapls <- fit_wapls(min_ncomp = 3, max_ncomp = 20)
my_wapls
Fitting method: wapls
   min_ncomp : 3 
   max_ncomp : 20 
   method    : mpls 
   scale     : FALSE 
   max_iter  : 100 
   tol       : 1e-06 
# Creates an object with instructions to build GPR models
my_gpr <- fit_gpr()
my_gpr
Fitting method: gpr
   noise_variance : 0.001 
   center         : TRUE 
   scale          : TRUE 

3 The mbl_control() function and validation types

The mbl_control() function is used to create control objects that define validation settings and selected optimization options for mbl(). In particular, the validation_type argument specifies how local model performance is evaluated.

Two validation methods are available:

The following examples illustrate how to create control objects for validation in mbl():

# Create an object with instructions to conduct both validation types
# "NNv" and "local_cv"
two_val_control <- mbl_control(
  validation_type = c("NNv", "local_cv"),
  number = 10,
  p = 0.75
)

The object two_val_control stores the settings for both validation methods, "NNv" and "local_cv". For "local_cv", the number of iterations is set to 10, and the proportion of neighbors retained in each local calibration subset is 75%.

4 Model fitting and prediction

With the main components of mbl() now defined, the function can be used to predict response values for the observations in test_x using train_x as the reference set. The following configuration reproduces the LOCAL algorithm (Shenk et al., 1997):

library(resemble)
library(prospectr)

# obtain a numeric vector of the wavelengths at which spectra is recorded 
wavs <- as.numeric(colnames(NIRsoil$spc))

# pre-process the spectra:
# - use detrend
# - use first order derivative
diff_order <- 1
poly_order <- 1
window <- 7

# Preprocess spectra
NIRsoil$spc_pr <- savitzkyGolay(
  detrend(NIRsoil$spc, wav = wavs),
  m = diff_order, p = poly_order, w = window
)
train_x <- NIRsoil$spc_pr[NIRsoil$train == 1, ]
train_y <- NIRsoil$Ciso[NIRsoil$train == 1]

test_x  <- NIRsoil$spc_pr[NIRsoil$train == 0, ]
test_y  <- NIRsoil$Ciso[NIRsoil$train == 0]
# Define the neighborhood sizes to test
my_ks <- seq(80, 160, by = 40)

# Define how to use the dissimilarity information (ignore it)
ignore_diss <- "none"

# Define the regression method to be used at each neighborhood
my_wapls <- fit_wapls(min_ncomp = 3, max_ncomp = 25, scale = TRUE)

# For the moment use only "NNv" validation (faster)
nnv_val_control <- mbl_control()

# Predict Total Carbon
# (remove missing values)
local_ciso <- mbl(
  Xr = train_x[!is.na(train_y), ],
  Yr = train_y[!is.na(train_y)],
  Xu = test_x,
  neighbors = neighbors_k(my_ks),
  diss_method = diss_correlation(center = FALSE),
  diss_usage = ignore_diss,
  fit_method = my_wapls,
  gh = TRUE,
  control = nnv_val_control
)

In the above code, the arguments Xr, Yr are used to specify the reference or training set of predictors variables and the reponse while Xu is used to specify the target or test set. The argument Yu (not yet introduced here) is also available to specify the response values for the target set, which can be used for validation purposes.

The subscript \(u\) denotes the target domain (partially “unknown”), whereas \(r\) denotes the reference library.

The local_ciso object can now be examined:

plot(local_ciso, main = "")
local_ciso
--- mbl predictions --- 
Predictions: 207 
_______________________________________________________ 
Dissimilarity 
Dissimilarity method: correlation
   window size (ws) : full (no moving window) 
   center           : FALSE 
   scale            : FALSE 
_______________________________________________________ 
Local fit method 
Fitting method: wapls
   min_ncomp : 3 
   max_ncomp : 25 
   method    : mpls 
   scale     : TRUE 
   max_iter  : 100 
   tol       : 1e-06 
_______________________________________________________ 

Nearest-neighbor validation 

   k  rmse st_rmse    r2
  80 0.740  0.0432 0.840
 120 0.756  0.0441 0.838
 160 0.782  0.0457 0.828
_______________________________________________________ 
Figure 2: MBL results for Total Carbon predictions using the LOCAL algorithm. NNv: nearest-neighbor cross-validation.

According to the results of the previous example, the neighborhood size that minimizes the root mean squared error (RMSE) under nearest-neighbor validation is 80. The corresponding LOCAL model can now be used to generate predictions for the test_x dataset:

bki <- which.min(local_ciso$validation_results$nearest_neighbor_validation$rmse)
bk <- local_ciso$validation_results$nearest_neighbor_validation$k[bki]

# All the prediction results are stored in:
local_ciso$results

# The get_predictions function makes easier to retrieve the
# predictions from the previous object
ciso_hat <- as.matrix(get_predictions(local_ciso))[, bki]

Figure 3 shows the predicted values for Total Carbon obtained with the LOCAL algorithm using the best neighborhood size according to nearest-neighbor validation, plotted against the reference values in test_y.

# Plot predicted vs reference
rng <- range(ciso_hat, test_y, na.rm = TRUE)
plot(ciso_hat, test_y,
     xlim = rng,
     ylim = rng,
     xlab = "Predicted Total Carbon, %",
     ylab = "Total Carbon, %",
     main = "LOCAL using a fixed k", 
     cex = 1.5,
     pch = 16, col = rgb(0.5, 0.5, 0.5, 0.6))
grid(lty = 1)
abline(0, 1, col = "red")
Figure 3: Predicted vs reference values for Total Carbon using the LOCAL algorithm with the best neighborhood size according to nearest-neighbor validation.

The prediction root mean squared error is then:

# Prediction RMSE:
sqrt(mean((ciso_hat - test_y)^2, na.rm = TRUE))
[1] 0.4514931
# Squared R
cor(ciso_hat, test_y, use = "complete.obs")^2
[1] 0.9143286

Similar results are obtained when the optimization of the neighborhoods is based on distance thresholds:

# Create a vector of dissimilarity thresholds to evaluate
# since the correlation dissimilarity will be used
# these thresholds need to be > 0 and <= 1
dths <- seq(0.025, 0.15, by = 0.025)

# Indicate the minimum and maximum sizes allowed for the neighborhood
k_min <- 30
k_max <- 150

local_ciso_diss <- mbl(
  Xr = train_x[!is.na(train_y), ],
  Yr = train_y[!is.na(train_y)],
  Xu = test_x,
  neighbors = neighbors_diss(threshold = dths, k_min = k_min, k_max = k_max),
  diss_method = diss_correlation(center = FALSE),
  diss_usage = ignore_diss,
  fit_method = my_wapls,
  control = nnv_val_control
)
plot(local_ciso_diss)
local_ciso_diss
--- mbl predictions --- 
Predictions: 207 
_______________________________________________________ 
Dissimilarity 
Dissimilarity method: correlation
   window size (ws) : full (no moving window) 
   center           : FALSE 
   scale            : FALSE 
_______________________________________________________ 
Local fit method 
Fitting method: wapls
   min_ncomp : 3 
   max_ncomp : 25 
   method    : mpls 
   scale     : TRUE 
   max_iter  : 100 
   tol       : 1e-06 
_______________________________________________________ 

Nearest-neighbor validation 

 k_diss p_bounded  rmse st_rmse    r2
  0.025   84.541% 0.631  0.0369 0.881
  0.050   88.406% 0.581  0.0339 0.907
  0.075    93.72% 0.613  0.0358 0.896
  0.100   97.101% 0.686  0.0400 0.867
  0.125   97.101% 0.732  0.0427 0.853
  0.150   99.034% 0.789  0.0460 0.824
_______________________________________________________ 

The best correlation dissimilarity threshold is 0.05. The column “p_bounded” in the table of validation results, indicates the percentage of neighborhoods for which the size was reset either to the k_min or k_max values in the neighbors object passed to the neighbors argument. The final validation lot is shown in Figure 4.

# Best distance threshold
bdi <- which.min(local_ciso_diss$validation_results$nearest_neighbor_validation$rmse)
bd <- local_ciso_diss$validation_results$nearest_neighbor_validation$k_diss[bdi]

# Predictions for the best distance
ciso_diss_hat <- as.matrix(get_predictions(local_ciso_diss))[, bdi]
# Plot predicted vs reference
plot(ciso_diss_hat, test_y,
     xlim = rng,
     ylim = rng,
     xlab = "Predicted Total Carbon, %",
     ylab = "Total Carbon, %",
     main = "LOCAL using a distance threshold \nfor neighbor retrieval", 
     cex = 1.5,
     pch = 16, col = rgb(0.5, 0.5, 0.5, 0.6))
grid()
abline(0, 1, col = "red")
Figure 4: Predicted vs reference values for Total Carbon using the LOCAL algorithm with the best neighborhood size according to nearest-neighbor validation and distance thresholds.
# RMSE
sqrt(mean((ciso_diss_hat - test_y)^2, na.rm = TRUE))
[1] 0.4619132
# Squared R
cor(ciso_diss_hat, test_y, use = "complete.obs")^2
[1] 0.9143266

4.1 Additional examples

Here we provide few additional examples of some MBL configurations where we make use of another response variable available in the dataset: soil cation exchange capacity (CEC). This variable is perhaps more challenging to predict in comparison to Total Carbon. Table 1 provides a summary of the configurations tested in the following code examples.

Table 1: Basic description of the different MBL configurations in the examples to predict Cation Exchange Capacity (CEC).
Abbreviation Dissimilarity method Dissimilarity usage Local regression
mbl_cor (LOCAL) Correlation None Weighted average PLS
mbl_pc optimized PC Source of predictors Weighted average PLS
mbl_pls optimized PLS None Weighted average PLS
mbl_gpr optimized PC Source of predictors Gaussian process
train_x <- NIRsoil$spc_pr[NIRsoil$train == 1, ]
train_cec <- NIRsoil$CEC[NIRsoil$train == 1]

test_x  <- NIRsoil$spc_pr[NIRsoil$train == 0, ]
test_cec  <- NIRsoil$CEC[NIRsoil$train == 0]
# Define the WAPLS fitting method
my_wapls <- fit_wapls(min_ncomp = 2, max_ncomp = 25, scale = FALSE)

# mbl_cor: LOCAL algorithm with correlation dissimilarity
dth_cor <- seq(0.01, 0.3, by = 0.08)
mbl_cor <- mbl(
  Xr = train_x[!is.na(train_cec), ],
  Yr = train_cec[!is.na(train_cec)],
  Xu = test_x,
  neighbors = neighbors_diss(threshold = dth_cor, k_min = 80, k_max = 150),
  diss_method = diss_correlation(),
  diss_usage = "none",
  fit_method = my_wapls,
  control = nnv_val_control
)

# mbl_pc: PCA dissimilarity with dissimilarity matrix as predictors
dth_pc <- seq(0.05, 1, by = 0.4)
mbl_pc <- mbl(
  Xr = train_x[!is.na(train_cec), ],
  Yr = train_cec[!is.na(train_cec)],
  Xu = test_x,
  neighbors = neighbors_diss(threshold = dth_pc, k_min = 80, k_max = 150),
  diss_method = diss_pca(ncomp = ncomp_by_opc(), scale = TRUE),
  diss_usage = "predictors",
  fit_method = my_wapls,
  control = nnv_val_control
)

# mbl_pls: PLS dissimilarity without dissimilarity predictors
mbl_pls <- mbl(
  Xr = train_x[!is.na(train_cec), ],
  Yr = train_cec[!is.na(train_cec)],
  Xu = test_x,
  Yu = test_cec,
  neighbors = neighbors_diss(threshold = dth_pc, k_min = 80, k_max = 150),
  diss_method = diss_pls(ncomp = ncomp_by_opc(), scale = TRUE),
  diss_usage = "none",
  fit_method = my_wapls,
  control = nnv_val_control
)

# mbl_gpr: Gaussian process regression with PCA dissimilarity as predictors
mbl_gpr <- mbl(
  Xr = train_x[!is.na(train_cec), ],
  Yr = train_cec[!is.na(train_cec)],
  Xu = test_x,
  neighbors = neighbors_diss(threshold = dth_pc, k_min = 80, k_max = 150),
  diss_method = diss_pca(ncomp = ncomp_by_opc(), scale = TRUE),
  diss_usage = "predictors",
  fit_method = fit_gpr(),
  control = nnv_val_control
)

Collect the predictions for each configuration:

# Get the indices of the best results according to
# nearest neighbor validation statistics
c_val_name <- "validation_results"
c_nn_val_name <- "nearest_neighbor_validation"

bi_cor <- which.min(mbl_cor[[c_val_name]][[c_nn_val_name]]$rmse)
bi_pc  <- which.min(mbl_pc[[c_val_name]][[c_nn_val_name]]$rmse)
bi_pls <- which.min(mbl_pls[[c_val_name]][[c_nn_val_name]]$rmse)
bi_gpr <- which.min(mbl_gpr[[c_val_name]][[c_nn_val_name]]$rmse)

preds <- cbind(
  get_predictions(mbl_cor)[, bi_cor],
  get_predictions(mbl_pc)[, bi_pc],
  get_predictions(mbl_pls)[, bi_pls],
  get_predictions(mbl_gpr)[, bi_gpr]
)

colnames(preds) <- c("mbl_cor", "mbl_pc", "mbl_pls", "mbl_gpr")
preds <- as.matrix(preds)

# R2s
cor(test_cec, preds, use = "complete.obs")^2
       mbl_cor    mbl_pc   mbl_pls   mbl_gpr
[1,] 0.7385189 0.7519877 0.7401701 0.5894437
# RMSEs
colMeans((preds - test_cec)^2, na.rm = TRUE)^0.5
 mbl_cor   mbl_pc  mbl_pls  mbl_gpr 
3.454423 3.425964 3.422748 4.178727 

Figure 5 illustrate the prediction results obtained for CEC with each of the MBL configurations tested.

old_par <- par("mfrow", "mar")
par(mfrow = c(2, 2))

plot(test_cec, preds[, "mbl_cor"],
     xlab = "CEC, meq/100g",
     ylab = "Predicted CEC, meq/100g", 
     main = "mbl_cor (LOCAL)")
abline(0, 1, col = "red")

plot(test_cec, preds[, "mbl_pc"],
     xlab = "CEC, meq/100g",
     ylab = "Predicted CEC, meq/100g", 
     main = "mbl_pc")
abline(0, 1, col = "red")

plot(test_cec, preds[, "mbl_pls"],
     xlab = "CEC, meq/100g",
     ylab = "Predicted CEC, meq/100g", 
     main = "mbl_pls")
abline(0, 1, col = "red")

plot(test_cec, preds[, "mbl_gpr"],
     xlab = "CEC, meq/100g",
     ylab = "Predicted CEC, meq/100g", 
     main = "mbl_gpr")
abline(0, 1, col = "red")

par(old_par)
Figure 5: CEC prediction results for the different MBL configurations tested

4.2 Using Yu argument

If response values for the prediction set are available, the Yu argument can be used to directly validate the predictions produced by mbl(). These values are used only for validation and are not involved in any optimization or modeling step. The argument can be used as follows:

# Use Yu argument to validate the predictions
pc_pred_cec_yu <- mbl(
  Xr = train_x[!is.na(train_cec), ],
  Yr = train_cec[!is.na(train_cec)],
  Xu = test_x,
  Yu = test_cec,
  neighbors = neighbors_k(80),
  diss_method = diss_pca(scale = TRUE),
  diss_usage = "none",
  verbose = FALSE,
  control = mbl_control()
)

pc_pred_cec_yu
--- mbl predictions --- 
Predictions: 207 
_______________________________________________________ 
Dissimilarity 
Dissimilarity: PCA
  method            : pca 
  ncomp             : var >= 0.01 (max: 40) 
  center            : TRUE 
  scale             : TRUE 
  return_projection : FALSE 
_______________________________________________________ 
Local fit method 
Fitting method: wapls
   min_ncomp : 3 
   max_ncomp : 15 
   method    : mpls 
   scale     : FALSE 
   max_iter  : 100 
   tol       : 1e-06 
_______________________________________________________ 

Nearest-neighbor validation 

  k rmse st_rmse    r2
 80 3.25  0.0743 0.673
_______________________________________________________ 
Yu prediction statistics 

  k rmse st_rmse   r2
 80 3.31  0.0937 0.77
_______________________________________________________ 

5 Supported parallel processing

The mbl() function uses foreach() from the foreach package to iterate over the rows in Xu. In the example below, the doParallel package is used to register the cores for parallel execution. The doSNOW package can also be used. This example uses parallel processing to predict CEC.

Note

These functions support parallel execution via the foreach and doParallel packages. However, parallel execution is only beneficial when the workload per iteration is large enough to outweigh the overhead of spawning worker processes and serialising data between them. In practice this means large prediction or reference sets (typically hundreds of observations or more), large neighbourhoods, and many PLS components. For small datasets, sequential execution is invariably faster. When in doubt, benchmark both before committing to a parallel workflow.

The following example may not be faster than sequential execution due to the relatively small size of the dataset, but it illustrates how to set up parallel processing for mbl():

# Running the mbl function using multiple cores

# Execute with two cores, if available, ...
n_cores <- 2

# ... if not then go with 1 core
if (parallel::detectCores() < 2) {
  n_cores <- 1
}

# Set the number of cores
library(doParallel)
clust <- makeCluster(n_cores)
registerDoParallel(clust)

# Alternatively:
# library(doSNOW)
# clust <- makeCluster(n_cores, type = "SOCK")
# registerDoSNOW(clust)
# getDoParWorkers()

pc_pred_cec <- mbl(
  Xr = train_x[!is.na(train_cec), ],
  Yr = train_cec[!is.na(train_cec)],
  Xu = test_x,
  neighbors = neighbors_k(seq(40, 100, by = 10)),
  diss_method = diss_pca(ncomp = ncomp_by_opc(), scale = TRUE),
  diss_usage = "none",
  control = mbl_control()
)

# Go back to sequential processing
registerDoSEQ()
try(stopCluster(clust))

pc_pred_cec

References

Mitchell, T.M., 1997. Machine learning, volume 1 of 1.
Naes, T., Isaksson, T., Kowalski, B., 1990. Locally weighted regression and scatter correction for near-infrared reflectance data. Analytical Chemistry 62, 664–673.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Demattê, J., Scholten, T., 2013. The spectrum-based learner: A new local approach for modeling soil vis–NIR spectra of complex datasets. Geoderma 195, 268–279.
Saul, L., Roweis, S., 2003. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of machine learning research 4, 119–155.
Shenk, J.S., Westerhaus, M.O., Berzaghi, P., 1997. Investigation of a LOCAL calibration procedure for near infrared instruments. Journal of Near Infrared Spectroscopy 5, 223–232.