confcons

Lifecycle: stable Codecov test coverage License R-CMD-check

‘confcons’ (confidence & consistency) is a light-weight, stand-alone R package designed to calculate the following two novel measures of predictive distribution models (incl. species distribution models):

While confidence serves as a replacement for the widely criticized goodness-of-fit measures, such as AUC, consistency is a proxy for model’s transferability (in space and time).

Installation

You can install the latest stable version of ‘confcons’ from CRAN with:

install.packages("confcons")

You can install the development version of ‘confcons’ from GitHub with:

# install.packages("devtools")
devtools::install_github(repo = "bfakos/confcons", upgrade = "never")

If you want to read the vignette of the development version in R, install the package with:

devtools::install_github(repo = "bfakos/confcons", upgrade = "never", build_vignettes = TRUE)

Examples

Three small functions, thresholds(), confidence() and consistency(), belong to the core of the package. A wrapper function called measures() utilizes these workhorse functions and calculates every measures for you optionally along with some traditional measures, such as AUC and maxTSS.

Let’s say we trained a predictive distribution model and made some predictions with it, and now we want to be sure if our model is both

Our example dataset is a data.frame containing both the training and the evaluation subset. It is organized in three columns:

dataset <- data.frame(
    observations = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
    predictions = c(0.1, 0.2, 0.4, 0.5, 0.5, 0.2, 0.3, 0.3, 0.4, 0.3, 0.65, 0.9, 0.9, 1, 0.1, 0.5, 0.8, 0.8),
    evaluation_mask = c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
)

Well, it is a really small dataset…

Let’s attach the package to our R session:

library(confcons)

Now we can calculate the measures:

measures(observations = dataset$observations,
                 predictions = dataset$predictions,
                 evaluation_mask = dataset$evaluation_mask)
#>    CP_train     CP_eval         DCP   CPP_train    CPP_eval        DCPP 
#>  0.80000000  0.75000000 -0.05000000  0.75000000  0.66666667 -0.08333333

The function returns

The difference between the two values forming the pairs is described in this scientific publication (TBD).

Our model seems to be not super perfect, but it is more or less confident in the positive predictions (i.e. predicted presences), since CPP_eval is closer to 1 than to 0. Even if not absolutely confident, it is really consistent (i.e., DCPP is close to 0), so we might not afraid of transferability issues if used for spatial or temporal extrapolation.

A detailed description of the measures and the functions of ‘confcons’, and more examples can be found in this vignette.

Citation

When you use this package, please cite the following scientific paper:

Somodi I, Bede-Fazekas Á, Botta-Dukát Z, Molnár Z (2024): Confidence and consistency in discrimination: A new family of evaluation metrics for potential distribution models. Ecological Modelling 491: 110667. DOI: 10.1016/j.ecolmodel.2024.110667.

Package lifecycle and contribution

This GitHub version of the package is now in stable state. If you find a bug or have a feature request, or also if you have some idea want to discuss with the authors of the package, please create a new issue.