Technical Details: Difference between ggpredict() and ggemmeans()

Daniel Lüdecke

2019-11-08

ggpredict() and ggemmeans() compute predicted values for all possible levels or values from a model’s predictor. Basically, ggpredict() wraps the predict()-method for the related model, while ggemmeans() wraps the emmeans()-method from the emmeans-package. Both ggpredict() and ggemmeans() do some data-preparation to bring the data in shape for the newdata-argument (predict()) resp. the at-argument (emmeans()). It is recommended to read the general introduction first, if you haven’t done this yet.

For models without categorical predictors, the results from ggpredict() and ggemmeans() are identical (except some slight differences in the associated confidence intervals, which are, however, negligable).

library(ggeffects)
data(efc)
fit <- lm(barthtot ~ c12hour + neg_c_7, data = efc)

ggpredict(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>    x predicted std.error conf.low conf.high
#>    0    75.072     1.077   72.962    77.183
#>   20    70.155     0.895   68.400    71.909
#>   45    64.008     0.818   62.405    65.610
#>   65    59.090     0.902   57.323    60.857
#>   85    54.172     1.087   52.042    56.302
#>  105    49.255     1.331   46.645    51.864
#>  125    44.337     1.609   41.184    47.490
#>  170    33.272     2.289   28.787    37.758
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

ggemmeans(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>    x predicted std.error conf.low conf.high
#>    0    75.072     1.077   72.959    77.186
#>   20    70.155     0.895   68.398    71.912
#>   45    64.008     0.818   62.403    65.612
#>   65    59.090     0.902   57.320    60.860
#>   85    54.172     1.087   52.039    56.305
#>  105    49.255     1.331   46.641    51.868
#>  125    44.337     1.609   41.180    47.494
#>  170    33.272     2.289   28.780    37.764
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

As can be seen, the continuous predictor neg_c_7 is held constant at its mean value, 11.83. For categorical predictors, ggpredict() and ggemmeans() behave differently. While ggpredict() uses the reference level of each categorical predictor to hold it constant, ggemmeans() - like ggeffects() - averages over the proportions of the categories of factors.

library(sjmisc)
data(efc)
efc$e42dep <- to_label(efc$e42dep)
fit <- lm(barthtot ~ c12hour + neg_c_7 + e42dep, data = efc)

ggpredict(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>    x predicted std.error conf.low conf.high
#>    0    92.745     2.173   88.485    97.004
#>   20    91.317     2.169   87.067    95.567
#>   45    89.532     2.208   85.206    93.859
#>   65    88.105     2.274   83.649    92.561
#>   85    86.677     2.368   82.037    91.318
#>  105    85.250     2.486   80.376    90.123
#>  125    83.822     2.627   78.674    88.970
#>  170    80.610     3.005   74.721    86.499
#> 
#> Adjusted for:
#> * neg_c_7 =       11.83
#> *  e42dep = independent

ggemmeans(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>    x predicted std.error conf.low conf.high
#>    0    73.515     0.846   71.853    75.176
#>   20    72.087     0.734   70.646    73.528
#>   45    70.302     0.718   68.894    71.711
#>   65    68.875     0.809   67.287    70.462
#>   85    67.447     0.966   65.550    69.344
#>  105    66.019     1.164   63.735    68.304
#>  125    64.592     1.384   61.875    67.309
#>  170    61.380     1.922   57.608    65.152
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

In this case, one would obtain the same results for ggpredict() and ggemmeans() again, if condition is used to define specific levels at which variables, in our case the factor e42dep, should be held constant.

ggpredict(fit, terms = "c12hour")
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>    x predicted std.error conf.low conf.high
#>    0    92.745     2.173   88.485    97.004
#>   20    91.317     2.169   87.067    95.567
#>   45    89.532     2.208   85.206    93.859
#>   65    88.105     2.274   83.649    92.561
#>   85    86.677     2.368   82.037    91.318
#>  105    85.250     2.486   80.376    90.123
#>  125    83.822     2.627   78.674    88.970
#>  170    80.610     3.005   74.721    86.499
#> 
#> Adjusted for:
#> * neg_c_7 =       11.83
#> *  e42dep = independent

ggemmeans(fit, terms = "c12hour", condition = c(e42dep = "independent"))
#> 
#> # Predicted values of Total score BARTHEL INDEX
#> # x = average number of hours of care per week
#> 
#>    x predicted std.error conf.low conf.high
#>    0    92.745     2.173   88.479    97.010
#>   20    91.317     2.169   87.061    95.573
#>   45    89.532     2.208   85.199    93.865
#>   65    88.105     2.274   83.642    92.567
#>   85    86.677     2.368   82.030    91.324
#>  105    85.250     2.486   80.370    90.130
#>  125    83.822     2.627   78.667    88.977
#>  170    80.610     3.005   74.712    86.507
#> 
#> Adjusted for:
#> * neg_c_7 = 11.83

Creating plots is as simple as described in the vignette Plotting Marginal Effects.

ggemmeans(fit, terms = c("c12hour", "e42dep")) %>% plot()