Customize Forest Plots and Tables

This article focuses on utility and enhanced customization of forest plots and accompanied tables.

Group rows and control strip placement

grouping creates section panels, and grouping_strip_position controls which side gets the strip labels.

coefs <- data.frame(
  term = c("Age", "BMI", "Smoking", "Stage II", "Stage III"),
  estimate = c(0.12, -0.10, 0.18, 0.30, 0.46),
  conf.low = c(0.03, -0.18, 0.04, 0.10, 0.18),
  conf.high = c(0.21, 0.02, 0.32, 0.50, 0.74),
  sample_size = c(120, 115, 98, 87, 83),
  p_value = c(0.04, 0.15, 0.29, 0.001, 0.075),
  section = c("Clinical", "Clinical", "Clinical", "Tumor", "Tumor")
)


ggforestplot(
  coefs,
  grouping = "section",
  grouping_strip_position = "right",
  striped_rows = TRUE
)

Distinct variable separation

Use separate_groups and separate_lines when you want a more distinct visual separation between variables. This is especially useful for categorical variables with many levels. separate_groups automatically appends the variable name to the level.

block_coefs <- data.frame(
  term = c("race_black", "race_white", "race_other", "age", "bmi"),
  label = c("Black", "White", "Other", "Age", "BMI"),
  estimate = c(0.24, 0.08, -0.04, 0.12, -0.09),
  conf.low = c(0.10, -0.04, -0.18, 0.03, -0.17),
  conf.high = c(0.38, 0.20, 0.10, 0.21, -0.01),
  variable_block = c("Race", "Race", "Race", "Age", "BMI")
)

ggforestplot(
  block_coefs,
  label = "label",
  separate_groups = "variable_block",
  separate_lines = TRUE,
  striped_rows = TRUE
) +
  scale_y_discrete(limits = rev(c("BMI", "Age", "Race: White", 
                                  "Race: Black", "Race: Other")))
#> Scale for y is already present.
#> Adding another scale for y, which will replace the existing scale.

Add a side table

add_forest_table() allows you to attach model information to the coefficient plot. The table can be added to either the left or right side and allows for some customization. You should always add the table LAST, after styling your plot because the function calls on patchwork internally. patchwork requires specific syntax to customize plots and is generally more difficult to get working correctly.

You can choose which columns from your dataframe to include in the table using the columns argument, and can change the labels using column_labels. If some of the term labels need to be changed, use term_labels to assign them new values. Some of the column labels are automatically assigned if no value is provided.

Notice how we are explicitly naming the n and p.value columns? This is necessary in most cases because aliases are not yet incorporated (but they will be…I promise I’m getting to it).

ggforestplot(
  coefs,
  grouping = "section",
  grouping_strip_position = "right",
  n = "sample_size",
  p.value = "p_value",
  striped_rows = TRUE,
  term_labels = c("Smoking" = "Smoking status")
) +
  add_forest_table(
    columns = c("term", "sample_size", "estimate", "p_value"),
    column_labels = c("term" = "Variable", "sample_size" = "N",
                      "estimate" = "Beta (95% CI)", "p_value" = "P-value")
  )

Customize the table

add_forest_table also lets you change some minor styling elements of the forest table.

ggforestplot(
  coefs,
  n = "sample_size",
  p.value = "p_value",
  striped_rows = TRUE
) +
  add_forest_table(
    position = "left",
    grid_lines = T,
    grid_line_linetype = 2,
    grid_line_colour = "red"
  )

Split tables

add_split_table() can be used to create more traditional looking forest plots. You can choose which summary information goes to which side. Like add_forest_table(), it should be added after any plot-level styling.

Use the estimate_fmt argument to change how your estimates are displayed. You can also control digits via estimate_digits and interval_digits.

ggforestplot(
  coefs,
  n = "sample_size",
  p.value = "p_value",
  striped_rows = TRUE
) +
  scale_x_continuous(limits = c(-.8,.8)) +
  add_split_table(
    left_columns = c("term","n"),
    right_columns = c("estimate","p"),
    column_labels = c("estimate" = "Beta [95% CI]"),
    estimate_fmt = "{estimate} [{conf.low}, {conf.high}]",
    estimate_digits = 2,
    interval_digits = 3
  )

Plotting other types of model coefficients

You can use exponentiate = TRUE for models on the log-odds scale (or similar).

data(CO2)

l1 <- glm(Treatment ~ conc + uptake + Type, family = binomial(link = "logit"), 
    data = CO2)


ggforestplot(l1, exponentiate = TRUE, striped_rows = T, term_labels = c("TypeMississippi" = "Mississippi")) +
  add_forest_table(position = "left", 
                   show_p = F)

We can do this for survival models as well.

lung <- survival::lung

lung <- lung |>  
  dplyr::mutate(
    status = dplyr::recode(status, `1` = 0, `2` = 1)
  )

s1 <- survival::coxph(Surv(time, status) ~ sex + age + ph.karno + pat.karno, data = lung)

ggforestplot(s1, exponentiate = T, striped_rows = T) +
  add_forest_table()

Compare multiple estimates

The group argument is handy when comparing estimates from several models.

comparison_coefs <- data.frame(
  term = rep(c("Age", "BMI", "Smoking", "Stage II", "Stage III"), 2),
  estimate = c(0.12, -0.10, 0.18, 0.30, 0.46, 0.08, -0.05, 0.24, 0.40, 0.58),
  conf.low = c(0.03, -0.18, 0.04, 0.10, 0.18, 0.00, -0.13, 0.10, 0.20, 0.30),
  conf.high = c(0.21, -0.02, 0.32, 0.50, 0.74, 0.16, 0.03, 0.38, 0.60, 0.86),
  model = rep(c("Model A", "Model B"), each = 5),
  section = rep(c("Clinical", "Clinical", "Clinical", "Tumor", "Tumor"), 2)
)

ggforestplot(
  comparison_coefs,
  group = "model",
  grouping = "section",
  striped_rows = TRUE,
  dodge_width = 0.5,
  grouping_strip_position = "right"
) +
  theme(legend.position = "top") +
  scale_color_manual(values = c("#1F968BFF", "#453781FF")) +
  add_forest_table()