summaryTable

The function summaryTable() produces a table with descriptive statistics for continuous, categorical and dichotomous variables. It is based on the function gtsummary::tbl_summary(), with several enhancements and simplifications, such as

Setup and data

To demonstrate the various functionalities of the function we will use the dataset survival::colon.

library(survival)
data(cancer, package="survival")

colon1 <-  colon %>%
  group_by(id) %>%
  slice(1) %>% # Select the first row within each id group
  ungroup()
  

The dataset colon contains data of 1858 patients from one of the first successful trials of adjuvant chemotherapy for colon cancer.

For simplicity, we focus here on recurrence only, two treatment groups, and four variable:

We also add a few missing values for the variable extent.

set.seed(123)
colon2 <- colon1 %>%
  select(rx, sex, age, extent) %>%
  filter(rx != "Lev") %>%
  mutate(rx = if_else(rx == "Obs", "Control", rx),
         extent = if_else(row_number() %in% sample(row_number(), size = round(0.1 * n())), NA, extent)) %>% 
  rename(Male = sex) %>% 
  mutate(extent = as.factor(extent))
head(colon2)
#> # A tibble: 6 × 4
#>   rx       Male   age extent
#>   <chr>   <dbl> <dbl> <fct> 
#> 1 Lev+5FU     1    43 3     
#> 2 Lev+5FU     1    63 3     
#> 3 Control     0    71 2     
#> 4 Lev+5FU     0    66 3     
#> 5 Control     1    69 3     
#> 6 Lev+5FU     0    57 3

Simple table

By default, the function produces a table with all variables present in the dataset.

summaryTable(data = colon2)

Characteristic

N

N = 6191

rx

619

Control

315 (51%)

Lev+5FU

304 (49%)

Male

619

0

312 (50%)

1

307 (50%)

age

619

61.0 (18.0, 85.0)

extent

557

1

17 (3%)

2

68 (11%)

3

446 (72%)

4

26 (4%)

Missing

62 (10%)

1n (%); Median (Min, Max)

If only specific variables are to be included, they need to be entered in the argument vars. The argument group allows the summary statistics to be stratified by this variable.

summaryTable(data = colon2, 
             vars = c("Male", "age", "extent"), 
             group = "rx")

Characteristic

N1

Control
N = 3152

N1

Lev+5FU
N = 3042

Male

315

304

0

149 (47%)

163 (54%)

1

166 (53%)

141 (46%)

age

315

60.0 (18.0, 85.0)

304

62.0 (26.0, 81.0)

extent

285

272

1

8 (3%)

9 (3%)

2

38 (12%)

30 (10%)

3

222 (70%)

224 (74%)

4

17 (5%)

9 (3%)

Missing

30 (10%)

32 (11%)

1N without missing values

2n (%); Median (Min, Max)

Displayed name of variables

The displayed name of each variable is

In order to customize the displayed name, the argument labels can be used. Please note that the labels need to be entered as a list, as shown below:

summaryTable(data = colon2, 
             group = "rx",
             labels = list(age = "Age", extent = "Extent"))

Characteristic

N1

Control
N = 3152

N1

Lev+5FU
N = 3042

Male

315

304

0

149 (47%)

163 (54%)

1

166 (53%)

141 (46%)

Age

315

60.0 (18.0, 85.0)

304

62.0 (26.0, 81.0)

Extent

285

272

1

8 (3%)

9 (3%)

2

38 (12%)

30 (10%)

3

222 (70%)

224 (74%)

4

17 (5%)

9 (3%)

Missing

30 (10%)

32 (11%)

1N without missing values

2n (%); Median (Min, Max)

Adding number of observations

The number of observations which are not missing values are by default added in a new column. This can be disabled by setting the argument add_n to FALSE.

summaryTable(data = colon2, 
             group = "rx",
            labels = list(rx = "Arm", age = "Age", extent = "Extent"), 
             add_n = FALSE)

Characteristic

Control
N = 3151

Lev+5FU
N = 3041

Male

0

149 (47%)

163 (54%)

1

166 (53%)

141 (46%)

Age

60.0 (18.0, 85.0)

62.0 (26.0, 81.0)

Extent

1

8 (3%)

9 (3%)

2

38 (12%)

30 (10%)

3

222 (70%)

224 (74%)

4

17 (5%)

9 (3%)

Missing

30 (10%)

32 (11%)

1n (%); Median (Min, Max)

Overall column

An “overall” column can be added by setting the argument overall to TRUE.

summaryTable(data = colon2, 
             group = "rx",
             overall = TRUE, 
             labels = list(age = "Age", extent = "Extent"))

Characteristic

N1

Control
N = 3152

N1

Lev+5FU
N = 3042

Overall
N = 6192

Male

315

304

0

149 (47%)

163 (54%)

312 (50%)

1

166 (53%)

141 (46%)

307 (50%)

Age

315

60.0 (18.0, 85.0)

304

62.0 (26.0, 81.0)

61.0 (18.0, 85.0)

Extent

285

272

1

8 (3%)

9 (3%)

17 (3%)

2

38 (12%)

30 (10%)

68 (11%)

3

222 (70%)

224 (74%)

446 (72%)

4

17 (5%)

9 (3%)

26 (4%)

Missing

30 (10%)

32 (11%)

62 (10%)

1N without missing values

2n (%); Median (Min, Max)

Variable types

The function gtsummary::tbl_summary considers numeric variables with fewer than 10 unique values as categorical by default. This is not the case in the function summaryTable.

Per default, all numeric variables are considered as continuous, unless they only have two unique values: 0 and 1. In that case, they are considered as dichotomous. This can be changed by setting the argument continuous_as to categorical.

For dichotomous variables, all levels are displayed by default. To show only one row, use the argument dichotomous_as = dichotomous. The reference level is specified using the argument value = list(variable ~ "level to show").

summaryTable(data = colon2,
             group = "rx",
             vars = "Male",
            labels = list(age = "Age"), 
            dichotomous_as = "dichotomous", 
            value = list(Male ~ "1"),
            missing = FALSE)

Characteristic

N1

Control
N = 3152

N1

Lev+5FU
N = 3042

Male

315

166 (53%)

304

141 (46%)

1N without missing values

2n (%)

By default, the function plots the median and range for continuous variables. A number of other options are available, using the argument stat_cont.

Statistic type

The statistics to be displayed can be chosen using the argument stat_cont (options: median_IQR, median_range (default), "mean_sd", "mean_se" and "geomMean_sd") and stat_cat (options: "n_percent" (default) "n" and "n_N").

summaryTable(data = colon2, group = "rx", 
             stat_cont = "median_IQR", 
             stat_cat = "n_N",
              labels = list(age = "Age", sex = "Sex", extent = "Extent"))

Characteristic

N1

Control
N = 3152

N1

Lev+5FU
N = 3042

Male

315

304

0

149/315

163/304

1

166/315

141/304

Age

315

60.0 (53.0, 68.0)

304

62.0 (52.0, 70.0)

Extent

285

272

1

8/315

9/304

2

38/315

30/304

3

222/315

224/304

4

17/315

9/304

Missing

30/315

32/304

1N without missing values

2n/N; Median (Q1, Q3)

Tests

By default, no p-value and confidence (CI) are displayed. p-values can be added by setting test to TRUE and CI by setting ci to TRUE.

The default test type for continuous variable is wilcox.test, and fisher.test for categorical variables. This can be changed in test_cont and test_cat, respectively.

The default CI type for continuous variables is wilcox.test and wilson for categorical variables. This can be changed in ci_cont and ci_cat, respectively.

summaryTable(data = colon2, 
             group = "rx", 
             vars = c("age", "extent"), 
             stat_cont = "mean_sd", 
             test = TRUE,
             ci = TRUE,
             labels = list(age = "Age", extent = "Extent")
             )

Characteristic

N1

Control
N = 3152

95% CI

N1

Lev+5FU
N = 3042

95% CI

p-value3

Age

315

59.5 (12.0)

[59, 62]

304

59.7 (12.3)

[59, 62]

0.60

Extent

285

272

0.37

1

8 (3%)

[1.2%, 5.1%]

9 (3%)

[1.5%, 5.7%]

2

38 (12%)

[8.8%, 16%]

30 (10%)

[6.9%, 14%]

3

222 (70%)

[65%, 75%]

224 (74%)

[68%, 78%]

4

17 (5%)

[3.3%, 8.7%]

9 (3%)

[1.5%, 5.7%]

Missing

30 (10%)

[6.6%, 13%]

32 (11%)

[7.4%, 15%]

1N without missing values

2Mean (SD); n (%)

3Wilcoxon rank sum test; Fisher's exact test

Abbreviation: CI = Confidence Interval

Missing values

Per default, missing values are shown as a separate category. This can be disabled by setting missing to FALSE.

For missing = TRUE, the percentage are automatically added next to the missing number. This can be disabled by setting the argument missing_percentage to FALSE.

summaryTable(data = colon2, 
             group = "rx", 
             vars = "extent", 
             test = TRUE,
             ci = TRUE,
             missing_percent = FALSE,
             labels = list(extent = "Extent")
             )

Characteristic

N1

Control
N = 3152

95% CI

N1

Lev+5FU
N = 3042

95% CI

p-value3

Extent

285

272

0.37

1

8 (3%)

[1.3%, 5.7%]

9 (3%)

[1.6%, 6.4%]

2

38 (13%)

[9.7%, 18%]

30 (11%)

[7.7%, 16%]

3

222 (78%)

[73%, 82%]

224 (82%)

[77%, 87%]

4

17 (6%)

[3.6%, 9.6%]

9 (3%)

[1.6%, 6.4%]

Missing

30

32

1N without missing values

2n (%)

3Fisher's exact test

Abbreviation: CI = Confidence Interval


summaryTable(data = colon2, 
             group = "rx", 
             vars = "extent", 
             test = TRUE,
             ci = TRUE,
             missing_percent = TRUE,
             labels = list(extent = "Extent")
             )

Characteristic

N1

Control
N = 3152

95% CI

N1

Lev+5FU
N = 3042

95% CI

p-value3

Extent

285

272

0.37

1

8 (3%)

[1.2%, 5.1%]

9 (3%)

[1.5%, 5.7%]

2

38 (12%)

[8.8%, 16%]

30 (10%)

[6.9%, 14%]

3

222 (70%)

[65%, 75%]

224 (74%)

[68%, 78%]

4

17 (5%)

[3.3%, 8.7%]

9 (3%)

[1.5%, 5.7%]

Missing

30 (10%)

[6.6%, 13%]

32 (11%)

[7.4%, 15%]

1N without missing values

2n (%)

3Fisher's exact test

Abbreviation: CI = Confidence Interval

The tables with and without missing values can also be put next to each other by setting missing to "both".

summaryTable(data = colon2, 
             group = "rx", 
             vars = "extent", 
             missing_percent = "both", 
             test = TRUE,
              labels = list(extent = "Extent")
             )

With missing

Without missing

Characteristic

Control
N = 3151

Lev+5FU
N = 3041

Control
N = 3151

Lev+5FU
N = 3041

p-value2

Extent

0.37

1

8 (3%)

9 (3%)

8 (3%)

9 (3%)

2

38 (12%)

30 (10%)

38 (13%)

30 (11%)

3

222 (70%)

224 (74%)

222 (78%)

224 (82%)

4

17 (5%)

9 (3%)

17 (6%)

9 (3%)

Missing

30 (10%)

32 (11%)

1n (%)

2Fisher's exact test

Further customization

Digits can be customized with the arguments digits_cont and digits_cat. The argument as_flex_table (default to TRUE) converts the gtsummary object to a flextable object, which is better for Word output.

Next steps

The argument type will be introduced in a future release to enable more fine-grained customization of the variables types.