
https://rvlenth.github.io/estimability/
When a model is rank-deficient, predictions from that model may not
be unique. The estimability R package provides a set of
tools to assess whether predictions from a rank-deficient model are
unique (i.e., are estimable). These tools apply to a rich class of
models that includes linear models, generalized linear models, mixed
models, and models with more general correlated-error structures (e.g.,
spatial models, time series models). For further details, see doi:10.32614/RJ-2015-016.
nonest.basis() function is provided that determines a
basis for the null space of a matrix. This may be used in conjunction
with is.estble() to determine the estimability (within a
tolerance) of a given linear function of the regression coefficients in
a linear model.epredict() methods are provided for
lm, glm, and mlm objects. These
work just like predict(), except an NA is
returned for any cases that are not estimable. This is a useful
alternative to the generic warning that “predictions from rank-deficient
models are unreliable.”estble.subspace() that projects a set of
linear functions onto anpredict methods.install.packages("estimability")Release notes for the latest CRAN version are found at https://cran.r-project.org/package=estimability/NEWS
– or do news(package = "estimability") for notes on the
version you have installed.
remotes::install_github("rvlenth/estimability", dependencies = TRUE)For latest release notes on this GitHub version, see the NEWS file
Suppose we have four predictors x1, x2,
x3, and x4, and a response variable
y:
x1 <- -4:4
x2 <- c(-2, 1, -1, 2, 0, 2, -1, 1,-2)
x3 <- 3 * x1 - 2 * x2
x4 <- x2 - x1 + 4
y <- 1 + x1 + x2 + x3 + x4 + c(-.5, .5, .5, -.5, 0, .5, -.5, -.5, .5)
dat <- data.frame(x1, x2, x3, x4, y)
head(dat)
#> x1 x2 x3 x4 y
#> 1 -4 -2 -8 6 -7.5
#> 2 -3 1 -11 8 -3.5
#> 3 -2 -1 -4 5 -0.5
#> 4 -1 2 -7 7 1.5
#> 5 0 0 0 4 5.0
#> 6 1 2 -1 5 8.5If we fit the following two models, different slope coefficients are
estimated as NA depending on the order they are provided to
the model formula:
mod1234 <- lm(y ~ x1 + x2 + x3 + x4, data = dat)
mod4321 <- lm(y ~ x4 + x3 + x2 + x1, data = dat)
zapsmall(rbind(
b1234 = coef(mod1234),
b4321 = coef(mod4321)[c(1, 5:2)]
))
#> (Intercept) x1 x2 x3 x4
#> b1234 5 3 0 NA NA
#> b4321 -19 NA NA 3 6Now suppose our goal is to make predictions of y for the
new observations in testset:
testset <- data.frame(
x1 = c(3, 6, 6, 0, 0, 1),
x2 = c(1, 2, 2, 0, 0, 2),
x3 = c(7, 14, 14, 0, 0, 3),
x4 = c(2, 4, 0, 4, 0, 4)
)
testset
#> x1 x2 x3 x4
#> 1 3 1 7 2
#> 2 6 2 14 4
#> 3 6 2 14 0
#> 4 0 0 0 4
#> 5 0 0 0 0
#> 6 1 2 3 4
cbind(
testset,
pred1234 = predict(mod1234, newdata = testset),
pred4321 = predict(mod4321, newdata = testset)
)
#> x1 x2 x3 x4 pred1234 pred4321
#> 1 3 1 7 2 14 14
#> 2 6 2 14 4 23 47
#> 3 6 2 14 0 23 23
#> 4 0 0 0 4 5 5
#> 5 0 0 0 0 5 -19
#> 6 1 2 3 4 8 14Predictions for the first, third, and fourth row are the same for both models, but predictions for the second, fifth, and sixth row are different. This is a problem, as the underlying data provided to the models are the same! Note that R does provide a warning about predictions from the rank-deficient fit (the warning itself is omitted here).
The estimability package remedies these inconsistencies
via epredict(), which indicates the rows of
testdata that are estimable (with their predictions) or not
estimable (with NA):
library(estimability)
cbind(
testset,
pred1234 = epredict(mod1234, newdata = testset),
pred4321 = epredict(mod4321, newdata = testset)
)
#> x1 x2 x3 x4 pred1234 pred4321
#> 1 3 1 7 2 14 14
#> 2 6 2 14 4 NA NA
#> 3 6 2 14 0 23 23
#> 4 0 0 0 4 5 5
#> 5 0 0 0 0 NA NA
#> 6 1 2 3 4 NA NAOther predict() arguments can be passed to
epredict(), e.g.,:
epredict(mod1234, newdata = testset, interval = "prediction", level = 0.9)
#> fit lwr upr
#> 1 14 12.715387 15.284613
#> 2 NA NA NA
#> 3 23 21.449058 24.550942
#> 4 5 3.817418 6.182582
#> 5 NA NA NA
#> 6 NA NA NA