Classical memory-based learning (local modelling)

Memory-based learning (MBL) describes a family of local learning methods that are well suited to complex and heterogeneous spectral datasets (Ramirez-Lopez et al., 2013). In MBL, instead of fitting a single global regression function, a local regression model is fitted for each target observation using its nearest neighbors identified in a calibration or reference set. Although the global relationship between \(X\) and \(Y\) may be complex, MBL approximates it through a collection of simpler local models, each assumed to be valid within a restricted region of the predictor space (Mitchell, 1997).

where \(\hat{f}_i\) denotes the local prediction function fitted for observation \(i\), and \(\theta_i\) represents the corresponding model parameters, such as the number of latent variables in a local PLS model.

constructed on demand for the observations to be predicted. The Figure 1 illustrates the basic steps in MBL for a set of five observations (\(m = 5\)).

\[ \begin{bmatrix} 0_{1,1} & d_{1,2} & \cdots & d_{1,k} & x_{1,1} & x_{1,2} & \cdots & x_{1,p} \\ d_{2,1} & 0_{2,2} & \cdots & d_{2,k} & x_{2,1} & x_{2,2} & \cdots & x_{2,p} \\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ d_{k,1} & d_{k,2} & \cdots & 0_{k,k} & x_{k,1} & x_{k,2} & \cdots & x_{k,p} \end{bmatrix} \]

where \(d_{i,j}\) represents the dissimilarity between the \(i\)th and \(j\)th neighbors.

In the literature, MBL is often referred to as local modeling. However, local modeling also includes other strategies, such as cluster-based modeling or geographically segmented modeling. In this sense, MBL can be regarded as one particular type of local modeling (Ramirez-Lopez et al., 2013).

The mbl() function in the resemble package provides a flexible framework for building customized memory-based learners. It allows the user to choose among different dissimilarity measures, neighborhood-selection methods, ways of using dissimilarity information, and regression methods for fitting the local models.

Readers are encouraged to review the sections on dissimilarity measures and k-nearest neighbor search first, as they provide the basis for the examples presented in this section.

The mbl() function can be described in terms of the four basic elements introduced above:

2 Fitting methods and constructors

The regressions available for fitting the local models in mbl() are specified through fitting-method objects. These objects are created using constructor functions such as fit_pls(), fit_wapls(), and fit_gpr(). Each of these functions accepts specific arguments that control the parameters of the corresponding regression method. For example, fit_pls() accepts the ncomp argument to specify the number of latent variables in the PLS model, while fit_wapls() accepts min_ncomp and max_ncomp to define a range of latent variable numbers for optimization. The following examples illustrate how to construct these fitting-method objects:

# Creates an object with instructions to build PLS models
my_pls <- fit_pls(ncomp = 15)
my_pls

Fitting method: pls
   ncomp    : 15 
   method   : pls 
   scale    : FALSE 
   max_iter : 100 
   tol      : 1e-06

# Creates an object with instructions to build WAPLS models
my_wapls <- fit_wapls(min_ncomp = 3, max_ncomp = 20)
my_wapls

Fitting method: wapls
   min_ncomp : 3 
   max_ncomp : 20 
   method    : mpls 
   scale     : FALSE 
   max_iter  : 100 
   tol       : 1e-06

# Creates an object with instructions to build GPR models
my_gpr <- fit_gpr()
my_gpr

Fitting method: gpr
   noise_variance : 0.001 
   center         : TRUE 
   scale          : TRUE

3 The mbl_control() function and validation types

The mbl_control() function is used to create control objects that define validation settings and selected optimization options for mbl(). In particular, the validation_type argument specifies how local model performance is evaluated.

The following examples illustrate how to create control objects for validation in mbl():

# Create an object with instructions to conduct both validation types
# "NNv" and "local_cv"
two_val_control <- mbl_control(
  validation_type = c("NNv", "local_cv"),
  number = 10,
  p = 0.75
)

The object two_val_control stores the settings for both validation methods, "NNv" and "local_cv". For "local_cv", the number of iterations is set to 10, and the proportion of neighbors retained in each local calibration subset is 75%.

4 Model fitting and prediction

With the main components of mbl() now defined, the function can be used to predict response values for the observations in test_x using train_x as the reference set. The following configuration reproduces the LOCAL algorithm (Shenk et al., 1997):

library(resemble)
library(prospectr)

# obtain a numeric vector of the wavelengths at which spectra is recorded 
wavs <- as.numeric(colnames(NIRsoil$spc))

# pre-process the spectra:
# - use detrend
# - use first order derivative
diff_order <- 1
poly_order <- 1
window <- 7

# Preprocess spectra
NIRsoil$spc_pr <- savitzkyGolay(
  detrend(NIRsoil$spc, wav = wavs),
  m = diff_order, p = poly_order, w = window
)
train_x <- NIRsoil$spc_pr[NIRsoil$train == 1, ]
train_y <- NIRsoil$Ciso[NIRsoil$train == 1]

test_x  <- NIRsoil$spc_pr[NIRsoil$train == 0, ]
test_y  <- NIRsoil$Ciso[NIRsoil$train == 0]

# Define the neighborhood sizes to test
my_ks <- seq(80, 160, by = 40)

# Define how to use the dissimilarity information (ignore it)
ignore_diss <- "none"

# Define the regression method to be used at each neighborhood
my_wapls <- fit_wapls(min_ncomp = 3, max_ncomp = 25, scale = TRUE)

# For the moment use only "NNv" validation (faster)
nnv_val_control <- mbl_control()

# Predict Total Carbon
# (remove missing values)
local_ciso <- mbl(
  Xr = train_x[!is.na(train_y), ],
  Yr = train_y[!is.na(train_y)],
  Xu = test_x,
  neighbors = neighbors_k(my_ks),
  diss_method = diss_correlation(center = FALSE),
  diss_usage = ignore_diss,
  fit_method = my_wapls,
  gh = TRUE,
  control = nnv_val_control
)

In the above code, the arguments Xr, Yr are used to specify the reference or training set of predictors variables and the reponse while Xu is used to specify the target or test set. The argument Yu (not yet introduced here) is also available to specify the response values for the target set, which can be used for validation purposes.

The subscript \(u\) denotes the target domain (partially “unknown”), whereas \(r\) denotes the reference library.

plot(local_ciso, main = "")
local_ciso

--- mbl predictions --- 
Predictions: 207 
_______________________________________________________ 
Dissimilarity 
Dissimilarity method: correlation
   window size (ws) : full (no moving window) 
   center           : FALSE 
   scale            : FALSE 
_______________________________________________________ 
Local fit method 
Fitting method: wapls
   min_ncomp : 3 
   max_ncomp : 25 
   method    : mpls 
   scale     : TRUE 
   max_iter  : 100 
   tol       : 1e-06 
_______________________________________________________ 

Nearest-neighbor validation 

   k  rmse st_rmse    r2
  80 0.740  0.0432 0.840
 120 0.756  0.0441 0.838
 160 0.782  0.0457 0.828
_______________________________________________________

Figure 2: MBL results for Total Carbon predictions using the LOCAL algorithm. NNv: nearest-neighbor cross-validation.

According to the results of the previous example, the neighborhood size that minimizes the root mean squared error (RMSE) under nearest-neighbor validation is 80. The corresponding LOCAL model can now be used to generate predictions for the test_x dataset:

bki <- which.min(local_ciso$validation_results$nearest_neighbor_validation$rmse)
bk <- local_ciso$validation_results$nearest_neighbor_validation$k[bki]

# All the prediction results are stored in:
local_ciso$results

# The get_predictions function makes easier to retrieve the
# predictions from the previous object
ciso_hat <- as.matrix(get_predictions(local_ciso))[, bki]

Figure 3 shows the predicted values for Total Carbon obtained with the LOCAL algorithm using the best neighborhood size according to nearest-neighbor validation, plotted against the reference values in test_y.

# Plot predicted vs reference
rng <- range(ciso_hat, test_y, na.rm = TRUE)
plot(ciso_hat, test_y,
     xlim = rng,
     ylim = rng,
     xlab = "Predicted Total Carbon, %",
     ylab = "Total Carbon, %",
     main = "LOCAL using a fixed k", 
     cex = 1.5,
     pch = 16, col = rgb(0.5, 0.5, 0.5, 0.6))
grid(lty = 1)
abline(0, 1, col = "red")

Figure 3: Predicted vs reference values for Total Carbon using the LOCAL algorithm with the best neighborhood size according to nearest-neighbor validation.

# Prediction RMSE:
sqrt(mean((ciso_hat - test_y)^2, na.rm = TRUE))

[1] 0.4514931

# Squared R
cor(ciso_hat, test_y, use = "complete.obs")^2

[1] 0.9143286

Similar results are obtained when the optimization of the neighborhoods is based on distance thresholds:

# Create a vector of dissimilarity thresholds to evaluate
# since the correlation dissimilarity will be used
# these thresholds need to be > 0 and <= 1
dths <- seq(0.025, 0.15, by = 0.025)

# Indicate the minimum and maximum sizes allowed for the neighborhood
k_min <- 30
k_max <- 150

local_ciso_diss <- mbl(
  Xr = train_x[!is.na(train_y), ],
  Yr = train_y[!is.na(train_y)],
  Xu = test_x,
  neighbors = neighbors_diss(threshold = dths, k_min = k_min, k_max = k_max),
  diss_method = diss_correlation(center = FALSE),
  diss_usage = ignore_diss,
  fit_method = my_wapls,
  control = nnv_val_control
)

plot(local_ciso_diss)

local_ciso_diss

--- mbl predictions --- 
Predictions: 207 
_______________________________________________________ 
Dissimilarity 
Dissimilarity method: correlation
   window size (ws) : full (no moving window) 
   center           : FALSE 
   scale            : FALSE 
_______________________________________________________ 
Local fit method 
Fitting method: wapls
   min_ncomp : 3 
   max_ncomp : 25 
   method    : mpls 
   scale     : TRUE 
   max_iter  : 100 
   tol       : 1e-06 
_______________________________________________________ 

Nearest-neighbor validation 

 k_diss p_bounded  rmse st_rmse    r2
  0.025   84.541% 0.631  0.0369 0.881
  0.050   88.406% 0.581  0.0339 0.907
  0.075    93.72% 0.613  0.0358 0.896
  0.100   97.101% 0.686  0.0400 0.867
  0.125   97.101% 0.732  0.0427 0.853
  0.150   99.034% 0.789  0.0460 0.824
_______________________________________________________

The best correlation dissimilarity threshold is 0.05. The column “p_bounded” in the table of validation results, indicates the percentage of neighborhoods for which the size was reset either to the k_min or k_max values in the neighbors object passed to the neighbors argument. The final validation lot is shown in Figure 4.

# Best distance threshold
bdi <- which.min(local_ciso_diss$validation_results$nearest_neighbor_validation$rmse)
bd <- local_ciso_diss$validation_results$nearest_neighbor_validation$k_diss[bdi]

# Predictions for the best distance
ciso_diss_hat <- as.matrix(get_predictions(local_ciso_diss))[, bdi]

# Plot predicted vs reference
plot(ciso_diss_hat, test_y,
     xlim = rng,
     ylim = rng,
     xlab = "Predicted Total Carbon, %",
     ylab = "Total Carbon, %",
     main = "LOCAL using a distance threshold \nfor neighbor retrieval", 
     cex = 1.5,
     pch = 16, col = rgb(0.5, 0.5, 0.5, 0.6))
grid()
abline(0, 1, col = "red")

Figure 4: Predicted vs reference values for Total Carbon using the LOCAL algorithm with the best neighborhood size according to nearest-neighbor validation and distance thresholds.

# RMSE
sqrt(mean((ciso_diss_hat - test_y)^2, na.rm = TRUE))

[1] 0.4619132

# Squared R
cor(ciso_diss_hat, test_y, use = "complete.obs")^2

[1] 0.9143266

4.1 Additional examples

Here we provide few additional examples of some MBL configurations where we make use of another response variable available in the dataset: soil cation exchange capacity (CEC). This variable is perhaps more challenging to predict in comparison to Total Carbon. Table 1 provides a summary of the configurations tested in the following code examples.

Table 1: Basic description of the different MBL configurations in the examples to predict Cation Exchange Capacity (CEC).

Abbreviation	Dissimilarity method	Dissimilarity usage	Local regression
`mbl_cor` (LOCAL)	Correlation	None	Weighted average PLS
`mbl_pc`	optimized PC	Source of predictors	Weighted average PLS
`mbl_pls`	optimized PLS	None	Weighted average PLS
`mbl_gpr`	optimized PC	Source of predictors	Gaussian process

train_x <- NIRsoil$spc_pr[NIRsoil$train == 1, ]
train_cec <- NIRsoil$CEC[NIRsoil$train == 1]

test_x  <- NIRsoil$spc_pr[NIRsoil$train == 0, ]
test_cec  <- NIRsoil$CEC[NIRsoil$train == 0]

# Define the WAPLS fitting method
my_wapls <- fit_wapls(min_ncomp = 2, max_ncomp = 25, scale = FALSE)

# mbl_cor: LOCAL algorithm with correlation dissimilarity
dth_cor <- seq(0.01, 0.3, by = 0.08)
mbl_cor <- mbl(
  Xr = train_x[!is.na(train_cec), ],
  Yr = train_cec[!is.na(train_cec)],
  Xu = test_x,
  neighbors = neighbors_diss(threshold = dth_cor, k_min = 80, k_max = 150),
  diss_method = diss_correlation(),
  diss_usage = "none",
  fit_method = my_wapls,
  control = nnv_val_control
)

# mbl_pc: PCA dissimilarity with dissimilarity matrix as predictors
dth_pc <- seq(0.05, 1, by = 0.4)
mbl_pc <- mbl(
  Xr = train_x[!is.na(train_cec), ],
  Yr = train_cec[!is.na(train_cec)],
  Xu = test_x,
  neighbors = neighbors_diss(threshold = dth_pc, k_min = 80, k_max = 150),
  diss_method = diss_pca(ncomp = ncomp_by_opc(), scale = TRUE),
  diss_usage = "predictors",
  fit_method = my_wapls,
  control = nnv_val_control
)

# mbl_pls: PLS dissimilarity without dissimilarity predictors
mbl_pls <- mbl(
  Xr = train_x[!is.na(train_cec), ],
  Yr = train_cec[!is.na(train_cec)],
  Xu = test_x,
  Yu = test_cec,
  neighbors = neighbors_diss(threshold = dth_pc, k_min = 80, k_max = 150),
  diss_method = diss_pls(ncomp = ncomp_by_opc(), scale = TRUE),
  diss_usage = "none",
  fit_method = my_wapls,
  control = nnv_val_control
)

# mbl_gpr: Gaussian process regression with PCA dissimilarity as predictors
mbl_gpr <- mbl(
  Xr = train_x[!is.na(train_cec), ],
  Yr = train_cec[!is.na(train_cec)],
  Xu = test_x,
  neighbors = neighbors_diss(threshold = dth_pc, k_min = 80, k_max = 150),
  diss_method = diss_pca(ncomp = ncomp_by_opc(), scale = TRUE),
  diss_usage = "predictors",
  fit_method = fit_gpr(),
  control = nnv_val_control
)

# Get the indices of the best results according to
# nearest neighbor validation statistics
c_val_name <- "validation_results"
c_nn_val_name <- "nearest_neighbor_validation"

bi_cor <- which.min(mbl_cor[[c_val_name]][[c_nn_val_name]]$rmse)
bi_pc  <- which.min(mbl_pc[[c_val_name]][[c_nn_val_name]]$rmse)
bi_pls <- which.min(mbl_pls[[c_val_name]][[c_nn_val_name]]$rmse)
bi_gpr <- which.min(mbl_gpr[[c_val_name]][[c_nn_val_name]]$rmse)

preds <- cbind(
  get_predictions(mbl_cor)[, bi_cor],
  get_predictions(mbl_pc)[, bi_pc],
  get_predictions(mbl_pls)[, bi_pls],
  get_predictions(mbl_gpr)[, bi_gpr]
)

colnames(preds) <- c("mbl_cor", "mbl_pc", "mbl_pls", "mbl_gpr")
preds <- as.matrix(preds)

# R2s
cor(test_cec, preds, use = "complete.obs")^2

       mbl_cor    mbl_pc   mbl_pls   mbl_gpr
[1,] 0.7385189 0.7519877 0.7401701 0.5894437

# RMSEs
colMeans((preds - test_cec)^2, na.rm = TRUE)^0.5

 mbl_cor   mbl_pc  mbl_pls  mbl_gpr 
3.454423 3.425964 3.422748 4.178727

Figure 5 illustrate the prediction results obtained for CEC with each of the MBL configurations tested.

old_par <- par("mfrow", "mar")
par(mfrow = c(2, 2))

plot(test_cec, preds[, "mbl_cor"],
     xlab = "CEC, meq/100g",
     ylab = "Predicted CEC, meq/100g", 
     main = "mbl_cor (LOCAL)")
abline(0, 1, col = "red")

plot(test_cec, preds[, "mbl_pc"],
     xlab = "CEC, meq/100g",
     ylab = "Predicted CEC, meq/100g", 
     main = "mbl_pc")
abline(0, 1, col = "red")

plot(test_cec, preds[, "mbl_pls"],
     xlab = "CEC, meq/100g",
     ylab = "Predicted CEC, meq/100g", 
     main = "mbl_pls")
abline(0, 1, col = "red")

plot(test_cec, preds[, "mbl_gpr"],
     xlab = "CEC, meq/100g",
     ylab = "Predicted CEC, meq/100g", 
     main = "mbl_gpr")
abline(0, 1, col = "red")

par(old_par)

Figure 5: CEC prediction results for the different MBL configurations tested

4.2 Using Yu argument

If response values for the prediction set are available, the Yu argument can be used to directly validate the predictions produced by mbl(). These values are used only for validation and are not involved in any optimization or modeling step. The argument can be used as follows:

# Use Yu argument to validate the predictions
pc_pred_cec_yu <- mbl(
  Xr = train_x[!is.na(train_cec), ],
  Yr = train_cec[!is.na(train_cec)],
  Xu = test_x,
  Yu = test_cec,
  neighbors = neighbors_k(80),
  diss_method = diss_pca(scale = TRUE),
  diss_usage = "none",
  verbose = FALSE,
  control = mbl_control()
)

pc_pred_cec_yu

--- mbl predictions --- 
Predictions: 207 
_______________________________________________________ 
Dissimilarity 
Dissimilarity: PCA
  method            : pca 
  ncomp             : var >= 0.01 (max: 40) 
  center            : TRUE 
  scale             : TRUE 
  return_projection : FALSE 
_______________________________________________________ 
Local fit method 
Fitting method: wapls
   min_ncomp : 3 
   max_ncomp : 15 
   method    : mpls 
   scale     : FALSE 
   max_iter  : 100 
   tol       : 1e-06 
_______________________________________________________ 

Nearest-neighbor validation 

  k rmse st_rmse    r2
 80 3.25  0.0743 0.673
_______________________________________________________ 
Yu prediction statistics 

  k rmse st_rmse   r2
 80 3.31  0.0937 0.77
_______________________________________________________

5 Supported parallel processing

The mbl() function uses foreach() from the foreach package to iterate over the rows in Xu. In the example below, the doParallel package is used to register the cores for parallel execution. The doSNOW package can also be used. This example uses parallel processing to predict CEC.

Note

These functions support parallel execution via the foreach and doParallel packages. However, parallel execution is only beneficial when the workload per iteration is large enough to outweigh the overhead of spawning worker processes and serialising data between them. In practice this means large prediction or reference sets (typically hundreds of observations or more), large neighbourhoods, and many PLS components. For small datasets, sequential execution is invariably faster. When in doubt, benchmark both before committing to a parallel workflow.

The following example may not be faster than sequential execution due to the relatively small size of the dataset, but it illustrates how to set up parallel processing for mbl():

# Running the mbl function using multiple cores

# Execute with two cores, if available, ...
n_cores <- 2

# ... if not then go with 1 core
if (parallel::detectCores() < 2) {
  n_cores <- 1
}

# Set the number of cores
library(doParallel)
clust <- makeCluster(n_cores)
registerDoParallel(clust)

# Alternatively:
# library(doSNOW)
# clust <- makeCluster(n_cores, type = "SOCK")
# registerDoSNOW(clust)
# getDoParWorkers()

pc_pred_cec <- mbl(
  Xr = train_x[!is.na(train_cec), ],
  Yr = train_cec[!is.na(train_cec)],
  Xu = test_x,
  neighbors = neighbors_k(seq(40, 100, by = 10)),
  diss_method = diss_pca(ncomp = ncomp_by_opc(), scale = TRUE),
  diss_usage = "none",
  control = mbl_control()
)

# Go back to sequential processing
registerDoSEQ()
try(stopCluster(clust))

pc_pred_cec

References

Mitchell, T.M., 1997. Machine learning, volume 1 of 1.

Naes, T., Isaksson, T., Kowalski, B., 1990. Locally weighted regression and scatter correction for near-infrared reflectance data. Analytical Chemistry 62, 664–673.

Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Demattê, J., Scholten, T., 2013. The spectrum-based learner: A new local approach for modeling soil vis–NIR spectra of complex datasets. Geoderma 195, 268–279.

Saul, L., Roweis, S., 2003. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of machine learning research 4, 119–155.

Shenk, J.S., Westerhaus, M.O., Berzaghi, P., 1997. Investigation of a LOCAL calibration procedure for near infrared instruments. Journal of Near Infrared Spectroscopy 5, 223–232.

Classical memory-based learning (local modelling)

1 Memory-based learning

2 Fitting methods and constructors

3 The `mbl_control()` function and validation types

4 Model fitting and prediction

4.1 Additional examples

4.2 Using `Yu` argument

5 Supported parallel processing

References

1 Memory-based learning

2 Fitting methods and constructors

3 The mbl_control() function and validation types

4 Model fitting and prediction

4.1 Additional examples

4.2 Using Yu argument

5 Supported parallel processing

References

3 The `mbl_control()` function and validation types

4.2 Using `Yu` argument