R package estimability: Support for determining estimability of linear functions estimability website

cran version downloads total downloads

Website:

https://rvlenth.github.io/estimability/

Overview

When a model is rank-deficient, predictions from that model may not be unique. The estimability R package provides a set of tools to assess whether predictions from a rank-deficient model are unique (i.e., are estimable). These tools apply to a rich class of models that includes linear models, generalized linear models, mixed models, and models with more general correlated-error structures (e.g., spatial models, time series models). For further details, see doi:10.32614/RJ-2015-016.

Features

Installation

install.packages("estimability")

Release notes for the latest CRAN version are found at https://cran.r-project.org/package=estimability/NEWS – or do news(package = "estimability") for notes on the version you have installed.

remotes::install_github("rvlenth/estimability", dependencies = TRUE)

For latest release notes on this GitHub version, see the NEWS file

Example

Suppose we have four predictors x1, x2, x3, and x4, and a response variable y:

x1 <- -4:4
x2 <- c(-2, 1, -1, 2, 0, 2, -1, 1,-2)
x3 <- 3 * x1 - 2 * x2
x4 <- x2 - x1 + 4
y <- 1 + x1 + x2 + x3 + x4 + c(-.5, .5, .5, -.5, 0, .5, -.5, -.5, .5)
dat <- data.frame(x1, x2, x3, x4, y)
head(dat)
#>   x1 x2  x3 x4    y
#> 1 -4 -2  -8  6 -7.5
#> 2 -3  1 -11  8 -3.5
#> 3 -2 -1  -4  5 -0.5
#> 4 -1  2  -7  7  1.5
#> 5  0  0   0  4  5.0
#> 6  1  2  -1  5  8.5

If we fit the following two models, different slope coefficients are estimated as NA depending on the order they are provided to the model formula:

mod1234 <- lm(y ~ x1 + x2 + x3 + x4, data = dat)
mod4321 <- lm(y ~ x4 + x3 + x2 + x1, data = dat)
zapsmall(rbind(
    b1234 = coef(mod1234),
    b4321 = coef(mod4321)[c(1, 5:2)]
))
#>       (Intercept) x1 x2 x3 x4
#> b1234           5  3  0 NA NA
#> b4321         -19 NA NA  3  6

Now suppose our goal is to make predictions of y for the new observations in testset:

testset <- data.frame( 
    x1 = c(3,  6,  6,  0,  0,  1), 
    x2 = c(1,  2,  2,  0,  0,  2), 
    x3 = c(7, 14, 14,  0,  0,  3), 
    x4 = c(2,  4,  0,  4,  0,  4)
)
testset
#>   x1 x2 x3 x4
#> 1  3  1  7  2
#> 2  6  2 14  4
#> 3  6  2 14  0
#> 4  0  0  0  4
#> 5  0  0  0  0
#> 6  1  2  3  4

cbind(
    testset, 
    pred1234 = predict(mod1234, newdata = testset),
    pred4321 = predict(mod4321, newdata = testset)
)
#>   x1 x2 x3 x4 pred1234 pred4321
#> 1  3  1  7  2       14       14
#> 2  6  2 14  4       23       47
#> 3  6  2 14  0       23       23
#> 4  0  0  0  4        5        5
#> 5  0  0  0  0        5      -19
#> 6  1  2  3  4        8       14

Predictions for the first, third, and fourth row are the same for both models, but predictions for the second, fifth, and sixth row are different. This is a problem, as the underlying data provided to the models are the same! Note that R does provide a warning about predictions from the rank-deficient fit (the warning itself is omitted here).

The estimability package remedies these inconsistencies via epredict(), which indicates the rows of testdata that are estimable (with their predictions) or not estimable (with NA):

library(estimability)
cbind(
    testset, 
    pred1234 = epredict(mod1234, newdata = testset),
    pred4321 = epredict(mod4321, newdata = testset)
)
#>   x1 x2 x3 x4 pred1234 pred4321
#> 1  3  1  7  2       14       14
#> 2  6  2 14  4       NA       NA
#> 3  6  2 14  0       23       23
#> 4  0  0  0  4        5        5
#> 5  0  0  0  0       NA       NA
#> 6  1  2  3  4       NA       NA

Other predict() arguments can be passed to epredict(), e.g.,:

epredict(mod1234, newdata = testset, interval = "prediction", level = 0.9)
#>   fit       lwr       upr
#> 1  14 12.715387 15.284613
#> 2  NA        NA        NA
#> 3  23 21.449058 24.550942
#> 4   5  3.817418  6.182582
#> 5  NA        NA        NA
#> 6  NA        NA        NA