The package semlbci
(Cheung &
Pesigan, 2023) includes functions for finding the likelihood-based
confidence intervals (LBCIs) of parameters in the output of a structural
equation modeling (SEM) function. Currently, it supports the output from
lavaan::lavaan()
and its wrappers, such as
lavaan::sem()
and lavaan::cfa()
.
The latest stable version can be installed from GitHub:
::install_github("sfcheung/semlbci") remotes
Further information about semlbci
can be found in Cheung and Pesigan
(2023).
The package has a dataset, simple_med
, with three
variables, x
, m
, and y
. Let us
fit a simple mediation model to this dataset.
library(semlbci)
data(simple_med)
<- simple_med
dat head(dat)
#> x m y
#> 1 -0.3447375 7.284273 -5.636897
#> 2 -0.3658919 -5.449121 -4.525402
#> 3 -0.8294968 -7.016254 -7.823257
#> 4 -0.3389654 4.367018 1.563098
#> 5 -0.9628162 -4.015469 -7.288511
#> 6 -1.0749302 -11.538140 -4.153572
library(lavaan)
#> This is lavaan 0.6-16
#> lavaan is FREE software! Please report any bugs.
<-
mod "
m ~ a*x
y ~ b*m
ab := a * b
"
# We set fixed.x = FALSE because we will also find the LBCIs for
# standardized solution
<- sem(mod, simple_med, fixed.x = FALSE) fit
To illustrate how to find the LBCIs for user-defined parameters, we
labelled the m ~ x
path by a
, the
y ~ m
path by b
, and defined the indirect
effect, ab
, by a * b
.
This is the summary:
summary(fit, standardized = TRUE)
#> lavaan 0.6.16 ended normally after 1 iteration
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 5
#>
#> Number of observations 200
#>
#> Model Test User Model:
#>
#> Test statistic 10.549
#> Degrees of freedom 1
#> P-value (Chi-square) 0.001
#>
#> Parameter Estimates:
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Regressions:
#> Estimate Std.Err z-value P(>|z|) Std.lv Std.all
#> m ~
#> x (a) 1.676 0.431 3.891 0.000 1.676 0.265
#> y ~
#> m (b) 0.535 0.073 7.300 0.000 0.535 0.459
#>
#> Variances:
#> Estimate Std.Err z-value P(>|z|) Std.lv Std.all
#> .m 34.710 3.471 10.000 0.000 34.710 0.930
#> .y 40.119 4.012 10.000 0.000 40.119 0.790
#> x 0.935 0.094 10.000 0.000 0.935 1.000
#>
#> Defined Parameters:
#> Estimate Std.Err z-value P(>|z|) Std.lv Std.all
#> ab 0.897 0.261 3.434 0.001 0.897 0.122
The main function to find the LBCIs for free parameters is
semlbci()
. This should be the only function used by normal
users. We will first illustrate its usage by some examples, and then
present other technical details in the following section.
All free parameters can be specified in lavaan
style.
For example, the path from m
to y
is denoted
by "y ~ m"
, and the covariance or correlation between
x
and m
(not in the example) is denoted by
"x ~~ m"
(order does not matter).
<- semlbci(sem_out = fit,
out pars = c("y ~ m",
"m ~ x"))
The output is the parameter table of the fitted lavaan
object, with two columns added, lbci_lb
and
lbci_ub
, the likelihood-based lower bounds and upper
bounds, respectively.
out#>
#> Results:
#> id lhs op rhs label est lbci_lb lbci_ub lb ub cl_lb cl_ub
#> 1 1 m ~ x a 1.676 0.828 2.525 0.832 2.520 0.950 0.950
#> 2 2 y ~ m b 0.535 0.391 0.679 0.391 0.679 0.950 0.950
#>
#> Annotation:
#> * lbci_lb, lbci_ub: The lower and upper likelihood-based bounds.
#> * est: The point estimates from the original lavaan output.
#> * lb, ub: The original lower and upper bounds, extracted from the
#> original lavaan output. Usually Wald CIs for free parameters and
#> delta method CIs for user-defined parameters
#> * cl_lb, cl_ub: One minus the p-values of chi-square difference tests
#> at the bounds. Should be close to the requested level of
#> confidence, e.g., .95 for 95% confidence intervals.
#>
#> Call:
#> semlbci(sem_out = fit, pars = c("y ~ m", "m ~ x"))
In this example, the point estimate of the unstandardized coefficient
from x
to m
is 1.676, and the LBCI is 0.828 to
2.525.
For users familiar with the column names, the annotation can be
disabled by calling print()
and add
annotation = FALSE
:
print(out, annotation = FALSE)
#>
#> Results:
#> id lhs op rhs label est lbci_lb lbci_ub lb ub cl_lb cl_ub
#> 1 1 m ~ x a 1.676 0.828 2.525 0.832 2.520 0.950 0.950
#> 2 2 y ~ m b 0.535 0.391 0.679 0.391 0.679 0.950 0.950
The results can also be printed in a lavaan
-like format
by calling print()
, setting sem_out
to the
original fit object (fit
in this example), and add
output = "lavaan"
:
print(out,
sem_out = fit,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Parameter Estimates:
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Regressions:
#> Estimate Std.Err z-value P(>|z|) lb.lower lb.upper
#> m ~
#> x (a) 1.676 0.431 3.891 0.000 0.828 2.525
#> y ~
#> m (b) 0.535 0.073 7.300 0.000 0.391 0.679
By default, the original confidence intervals will not be printed.
See the help page of print.semlbci()
for other options
available.
To find the LBCI for a user-defined parameter, use
label :=
, where label
is the label used in the
model specification. The definition of this parameter can be omitted.
The content after :=
will be ignored by
semlbci()
.
<- semlbci(sem_out = fit,
out pars = c("ab := "))
print(out,
sem_out = fit,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Parameter Estimates:
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Defined Parameters:
#> Estimate Std.Err z-value P(>|z|) lb.lower lb.upper
#> ab 0.897 0.261 3.434 0.001 0.427 1.464
In this example, the point estimate of the indirect effect is 0.897, and the LBCI is 0.427 to 1.464.
(Note: In some examples, we added annotation = FALSE
to
suppress the annotation in the printout to minimize the length of this
vignette.)
By the default, the unstandardized solution is used by
semlbci()
. If the LBCIs for the standardized solution
solution are needed, set standardized = TRUE
.
<- semlbci(sem_out = fit,
out pars = c("y ~ m",
"m ~ x"),
standardized = TRUE)
print(out,
sem_out = fit,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Standardized Estimates Only
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Regressions:
#> Standardized Std.Err z-value P(>|z|) lb.lower lb.upper
#> m ~
#> x (a) 0.265 0.066 4.035 0.000 0.133 0.389
#> y ~
#> m (b) 0.459 0.056 8.215 0.000 0.342 0.561
If LBCIs are for the standardized solution and output
set to "lavaan"
when printing the results, the parameter
estimates, standard errors, z-values, and p-values are
those from the standardized solution.
The LBCIs for standardized user-defined parameters can be requested similarly.
<- semlbci(sem_out = fit,
out pars = c("ab :="),
standardized = TRUE)
print(out,
sem_out = fit,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Standardized Estimates Only
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Defined Parameters:
#> Standardized Std.Err z-value P(>|z|) lb.lower lb.upper
#> ab 0.122 0.034 3.538 0.000 0.059 0.194
semlbci()
sem_out
and pars
: The fit object and the
parametersThe only required argument for semlbci()
is
sem_out
, the fit object from lavaan::lavaan()
or its wrappers (e.g., lavann::cfa()
and
lavaan::sem()
). By default, semlbci()
will
find the LBCIs for all free parameters (except for variances and error
variances) and user-defined parameters, which can take a long time for a
model with many parameters. Moreover, LBCI is usually used when
Wald-type confidence interval may not be suitable, for example, forming
the confidence interval for an indirect effect or a parameter in the
standardized solution. These parameters may have sampling distributions
that are asymmetric or otherwise substantially nonnormal due to bounded
parameter spaces or other reasons.
Therefore, it is recommended to call semlbci()
without
specifying any parameters. If the time to run is long, then call
semlbci()
only for selected parameters. The argument
pars
should be a model syntax or a vector of strings which
specifies the parameters for which LBCIs will be formed (detailed
below).
If time is not a concern, for example, when users want to explore the
LBCIs for all free and user-defined parameters in a final model, then
pars
can be omitted to request the LBCIs for all free
parameters (except for variances and covariances) and user-defined
parameters (if any) in a model.
ciperc
: The level of confidenceBy default, the 95% LBCIs for the unstandardized solution will be
formed. To change the level of confidence, set the argument
ciperc
to the desired coverage probability, e.g., .95 for
95%, .90 for 90%.
standardized
: Whether standardized solution is
usedBy default, the LBCIs for the unstandardized solution will be formed.
If the LBCIs for the standardized solution are desired, set
standardized = TRUE
. Note that for some models it can be
much slower to find the LBCIs for the standardized solution than for the
unstandardized solution.
parallel
and ncpus
The search for the bounds needs to be done separately for each bound
and this can take a long time for a model with many parameters and/or
with equality constraints. Therefore, parallel processing should always
be enabled by setting parallel
to TRUE
and
ncpus
to a number smaller than the number of available
cores. For example, without parallel processing, the following search
took about 28 seconds on Intel i7-8700:
data(HolzingerSwineford1939)
<-
mod_test '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
<- cfa(model = mod_test,
fit_cfa data = HolzingerSwineford1939)
semlbci(fit_cfa)
With parallel processing enabled and using 6 cores, it took about 20 seconds.
semlbci(fit_cfa,
parallel = TRUE,
ncpus = 6)
The speed difference can be much greater for a model with many parameters and some equality constraints.
Enabling parallel processing also has the added benefit of showing the progress in real time.
For detailed documentation of other arguments, please refer to the
help page of semlbci()
. Advanced users who want to tweak
the optimization options can check the help pages of
ci_bound_wn_i()
and ci_i_one()
,
semlbci()
supports multiple-group models. For example,
this is a two-group confirmatory factor analysis model with equality
constraints:
data(HolzingerSwineford1939)
<-
mod_cfa '
visual =~ x1 + v(lambda2, lambda2)*x2 + v(lambda3, lambda3)*x3
textual =~ x4 + v(lambda5, lambda5)*x5 + v(lambda6, lambda6)*x6
speed =~ x7 + v(lambda8, lambda8)*x8 + v(lambda9, lambda9)*x9
'
<- cfa(model = mod_cfa,
fit_cfa data = HolzingerSwineford1939,
group = "school")
The factor correlations between group are not constrained to be equal.
parameterEstimates(fit_cfa)[c(22, 23, 58, 59), ]
#> lhs op rhs block group label est se z pvalue ci.lower
#> 22 visual ~~ textual 1 1 0.416 0.097 4.271 0.000 0.225
#> 23 visual ~~ speed 1 1 0.169 0.064 2.643 0.008 0.044
#> 58 visual ~~ textual 2 2 0.437 0.099 4.423 0.000 0.243
#> 59 visual ~~ speed 2 2 0.314 0.079 3.958 0.000 0.158
#> ci.upper
#> 22 0.606
#> 23 0.294
#> 58 0.631
#> 59 0.469
This is the LBCI for covariance between visual ability and textual ability:
<- semlbci(fit_cfa,
fcov pars = c("visual ~~ textual"))
print(fcov,
sem_out = fit_cfa,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Parameter Estimates:
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#>
#> Group 1 [Pasteur]:
#>
#> Covariances:
#> Estimate Std.Err z-value P(>|z|) lb.lower lb.upper
#> visual ~~
#> textual 0.416 0.097 4.271 0.000 0.221 0.654
#>
#>
#> Group 2 [Grant-White]:
#>
#> Covariances:
#> Estimate Std.Err z-value P(>|z|) lb.lower lb.upper
#> visual ~~
#> textual 0.437 0.099 4.423 0.000 0.263 0.663
This is the LBCI for correlation between visual ability and textual ability:
<- semlbci(fit_cfa,
fcor pars = c("visual ~~ textual"),
standardized = TRUE)
print(fcor,
sem_out = fit_cfa,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Standardized Estimates Only
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#>
#> Group 1 [Pasteur]:
#>
#> Covariances:
#> Standardized Std.Err z-value P(>|z|) lb.lower lb.upper
#> visual ~~
#> textual 0.485 0.087 5.601 0.000 0.291 0.640
#>
#>
#> Group 2 [Grant-White]:
#>
#> Covariances:
#> Standardized Std.Err z-value P(>|z|) lb.lower lb.upper
#> visual ~~
#> textual 0.540 0.086 6.317 0.000 0.357 0.692
Note that the example above can take more than one minute to one if parallel processing is not enabled.
semlbci()
also supports the robust LBCI proposed by Falk
(2018). To form robust LBCI, the model must be fitted with robust test
statistics requested (e.g., estimator = "MLR"
). To request
robust LBCIs, add robust = "satorra.2000"
when calling
semlbci()
.
We use the simple mediation model as an example:
<- sem(mod, simple_med,
fit_robust fixed.x = FALSE,
estimator = "MLR")
<- semlbci(fit_robust,
fit_lbci_ab_robust pars = "ab := ",
robust = "satorra.2000")
print(fit_lbci_ab_robust,
sem_out = fit_robust,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Parameter Estimates:
#>
#> Standard errors Sandwich
#> Information bread Observed
#> Observed information based on Hessian
#>
#> Defined Parameters:
#> Estimate Std.Err z-value P(>|z|) lb.lower lb.upper
#> ab 0.897 0.305 2.941 0.003 0.353 1.571
semlbci()
support forming the LBCIs for most free
parameters. Not illustrated above but LBCIs can be formed for path
coefficients between latent variables and also user-defined parameters
based on latent-level parameters, such as an indirect effect from one
latent variable to another.
More examples can be found in the “examples” folders in the OSF page for this package.
The following is a summary of the limitations of
semlbci()
. Please refer to check_sem_out()
for
the full list of limitations. This function is called by
semlbci()
to check the sem_out
object, and
will raise warnings or errors as appropriate.
The function semlbci()
currently supports
lavaan::lavaan()
results estimated by maximum likelihood
(ML
), full information maximum likelihood for missing data
(fiml
), and their robust variants (e.g.,
MLM
).
This package currently supports single and multiple group models with continuous variables. It may work for a model with ordered variables but this is not officially tested.
The current and preferred method is the one proposed by Wu and Neale
(2012), illustrated by Pek and Wu (2015). The current implementation in
semlbci()
does not check whether a parameter is near its
boundary. The more advanced methods by Pritikin, Rappaport, and Neale
(2017) will be considered in future development.
A detailed presentation of the internal workflow of
semlbci()
can be found in the
vignette("technical_workflow", package = "semlbci")
. Users
interested in calling the lowest level function,
ci_bound_wn_i()
, can see some illustrative examples in
vignette("technical_searching_one_bound", package = "semlbci")
.
Cheung, S. F., & Pesigan, I. J. A. (2023). semlbci: An R package for forming likelihood-based confidence intervals for parameter estimates, correlations, indirect effects, and other derived parameters. Structural Equation Modeling: A Multidisciplinary Journal. Advance online publication. https://doi.org/10.1080/10705511.2023.2183860
Falk, C. F. (2018). Are robust standard errors the best approach for interval estimation with nonnormal data in structural equation modeling? Structural Equation Modeling: A Multidisciplinary Journal, 25(2), 244-266. https://doi.org/10.1080/10705511.2017.1367254
Pek, J., & Wu, H. (2015). Profile likelihood-based confidence intervals and regions for structural equation models. Psychometrika, 80(4), 1123–1145. https://doi.org/10.1007/s11336-015-9461-1
Pritikin, J. N., Rappaport, L. M., & Neale, M. C. (2017). Likelihood-based confidence intervals for a parameter with an upper or lower bound. Structural Equation Modeling: A Multidisciplinary Journal, 24(3), 395-401. https://doi.org/10.1080/10705511.2016.1275969
Wu, H., & Neale, M. C. (2012). Adjusted confidence intervals for a bounded parameter. Behavior Genetics, 42(6), 886–898. https://doi.org/10.1007/s10519-012-9560-z