summaryTable

The function summaryTable() produces a table with descriptive statistics for continuous, categorical and dichotomous variables. It is based on the function gtsummary::tbl_summary(), with several enhancements and simplifications, such as

Setup and data

To demonstrate the various functionalities of the function we will use the dataset survival::colon.

library(survival)
data(cancer, package="survival")

colon1 <-  colon %>%
  group_by(id) %>%
  slice(1) %>% # Select the first row within each id group
  ungroup()

The dataset colon contains data of 1858 patients from one of the first successful trials of adjuvant chemotherapy for colon cancer.

For simplicity, we focus here on recurrence only, two treatment groups, and four variable:

the treatment group (rx),
the sex (Male),
the age (age) and
the extent of local spread (extent).

We also add a few missing values for the variable extent.

set.seed(123)
colon2 <- colon1 %>%
  select(rx, sex, age, extent) %>%
  filter(rx != "Lev") %>%
  mutate(rx = if_else(rx == "Obs", "Control", rx),
         extent = if_else(row_number() %in% sample(row_number(), size = round(0.1 * n())), NA, extent)) %>% 
  rename(Male = sex) %>% 
  mutate(extent = as.factor(extent))

head(colon2)
#> # A tibble: 6 × 4
#>   rx       Male   age extent
#>   <chr>   <dbl> <dbl> <fct> 
#> 1 Lev+5FU     1    43 3     
#> 2 Lev+5FU     1    63 3     
#> 3 Control     0    71 2     
#> 4 Lev+5FU     0    66 3     
#> 5 Control     1    69 3     
#> 6 Lev+5FU     0    57 3

Simple table

By default, the function produces a table with all variables present in the dataset.

summaryTable(data = colon2)

Characteristic	N	N = 6191
rx	619
Control		315 (51%)
Lev+5FU		304 (49%)
Male	619
0		312 (50%)
1		307 (50%)
age	619	61.0 (18.0, 85.0)
extent	557
1		17 (3%)
2		68 (11%)
3		446 (72%)
4		26 (4%)
Missing		62 (10%)
1n (%); Median (Min, Max)

If only specific variables are to be included, they need to be entered in the argument vars. The argument group allows the summary statistics to be stratified by this variable.

summaryTable(data = colon2, 
             vars = c("Male", "age", "extent"), 
             group = "rx")

Characteristic	N1	Control N = 3152	N1	Lev+5FU N = 3042
Male	315		304
0		149 (47%)		163 (54%)
1		166 (53%)		141 (46%)
age	315	60.0 (18.0, 85.0)	304	62.0 (26.0, 81.0)
extent	285		272
1		8 (3%)		9 (3%)
2		38 (12%)		30 (10%)
3		222 (70%)		224 (74%)
4		17 (5%)		9 (3%)
Missing		30 (10%)		32 (11%)
1N without missing values
2n (%); Median (Min, Max)

Displayed name of variables

The displayed name of each variable is

the label if it exists in the dataset, or
the variable name if no label is present in the dataset (which is the case in our example).

In order to customize the displayed name, the argument labels can be used. Please note that the labels need to be entered as a list, as shown below:

summaryTable(data = colon2, 
             group = "rx",
             labels = list(age = "Age", extent = "Extent"))

Characteristic	N1	Control N = 3152	N1	Lev+5FU N = 3042
Male	315		304
0		149 (47%)		163 (54%)
1		166 (53%)		141 (46%)
Age	315	60.0 (18.0, 85.0)	304	62.0 (26.0, 81.0)
Extent	285		272
1		8 (3%)		9 (3%)
2		38 (12%)		30 (10%)
3		222 (70%)		224 (74%)
4		17 (5%)		9 (3%)
Missing		30 (10%)		32 (11%)
1N without missing values
2n (%); Median (Min, Max)

Adding number of observations

The number of observations which are not missing values are by default added in a new column. This can be disabled by setting the argument add_n to FALSE.

summaryTable(data = colon2, 
             group = "rx",
            labels = list(rx = "Arm", age = "Age", extent = "Extent"), 
             add_n = FALSE)

Characteristic	Control N = 3151	Lev+5FU N = 3041
Male
0	149 (47%)	163 (54%)
1	166 (53%)	141 (46%)
Age	60.0 (18.0, 85.0)	62.0 (26.0, 81.0)
Extent
1	8 (3%)	9 (3%)
2	38 (12%)	30 (10%)
3	222 (70%)	224 (74%)
4	17 (5%)	9 (3%)
Missing	30 (10%)	32 (11%)
1n (%); Median (Min, Max)

Overall column

An “overall” column can be added by setting the argument overall to TRUE.

summaryTable(data = colon2, 
             group = "rx",
             overall = TRUE, 
             labels = list(age = "Age", extent = "Extent"))

Characteristic	N1	Control N = 3152	N1	Lev+5FU N = 3042	Overall N = 6192
Male	315		304
0		149 (47%)		163 (54%)	312 (50%)
1		166 (53%)		141 (46%)	307 (50%)
Age	315	60.0 (18.0, 85.0)	304	62.0 (26.0, 81.0)	61.0 (18.0, 85.0)
Extent	285		272
1		8 (3%)		9 (3%)	17 (3%)
2		38 (12%)		30 (10%)	68 (11%)
3		222 (70%)		224 (74%)	446 (72%)
4		17 (5%)		9 (3%)	26 (4%)
Missing		30 (10%)		32 (11%)	62 (10%)
1N without missing values
2n (%); Median (Min, Max)

Variable types

The function gtsummary::tbl_summary considers numeric variables with fewer than 10 unique values as categorical by default. This is not the case in the function summaryTable.

Per default, all numeric variables are considered as continuous, unless they only have two unique values: 0 and 1. In that case, they are considered as dichotomous. This can be changed by setting the argument continuous_as to categorical.

For dichotomous variables, all levels are displayed by default. To show only one row, use the argument dichotomous_as = dichotomous. The reference level is specified using the argument value = list(variable ~ "level to show").

summaryTable(data = colon2,
             group = "rx",
             vars = "Male",
            labels = list(age = "Age"), 
            dichotomous_as = "dichotomous", 
            value = list(Male ~ "1"),
            missing = FALSE)

Characteristic	N1	Control N = 3152	N1	Lev+5FU N = 3042
Male	315	166 (53%)	304	141 (46%)
1N without missing values
2n (%)

By default, the function plots the median and range for continuous variables. A number of other options are available, using the argument stat_cont.

Statistic type

The statistics to be displayed can be chosen using the argument stat_cont (options: median_IQR, median_range (default), "mean_sd", "mean_se" and "geomMean_sd") and stat_cat (options: "n_percent" (default) "n" and "n_N").

summaryTable(data = colon2, group = "rx", 
             stat_cont = "median_IQR", 
             stat_cat = "n_N",
              labels = list(age = "Age", sex = "Sex", extent = "Extent"))

Characteristic	N1	Control N = 3152	N1	Lev+5FU N = 3042
Male	315		304
0		149/315		163/304
1		166/315		141/304
Age	315	60.0 (53.0, 68.0)	304	62.0 (52.0, 70.0)
Extent	285		272
1		8/315		9/304
2		38/315		30/304
3		222/315		224/304
4		17/315		9/304
Missing		30/315		32/304
1N without missing values
2n/N; Median (Q1, Q3)

Tests

By default, no p-value and confidence (CI) are displayed. p-values can be added by setting test to TRUE and CI by setting ci to TRUE.

The default test type for continuous variable is wilcox.test, and fisher.test for categorical variables. This can be changed in test_cont and test_cat, respectively.

The default CI type for continuous variables is wilcox.test and wilson for categorical variables. This can be changed in ci_cont and ci_cat, respectively.

summaryTable(data = colon2, 
             group = "rx", 
             vars = c("age", "extent"), 
             stat_cont = "mean_sd", 
             test = TRUE,
             ci = TRUE,
             labels = list(age = "Age", extent = "Extent")
             )

Characteristic	N1	Control N = 3152	95% CI	N1	Lev+5FU N = 3042	95% CI	p-value3
Age	315	59.5 (12.0)	[59, 62]	304	59.7 (12.3)	[59, 62]	0.60
Extent	285			272			0.37
1		8 (3%)	[1.2%, 5.1%]		9 (3%)	[1.5%, 5.7%]
2		38 (12%)	[8.8%, 16%]		30 (10%)	[6.9%, 14%]
3		222 (70%)	[65%, 75%]		224 (74%)	[68%, 78%]
4		17 (5%)	[3.3%, 8.7%]		9 (3%)	[1.5%, 5.7%]
Missing		30 (10%)	[6.6%, 13%]		32 (11%)	[7.4%, 15%]
1N without missing values
2Mean (SD); n (%)
3Wilcoxon rank sum test; Fisher's exact test
Abbreviation: CI = Confidence Interval

Missing values

Per default, missing values are shown as a separate category. This can be disabled by setting missing to FALSE.

For missing = TRUE, the percentage are automatically added next to the missing number. This can be disabled by setting the argument missing_percentage to FALSE.

summaryTable(data = colon2, 
             group = "rx", 
             vars = "extent", 
             test = TRUE,
             ci = TRUE,
             missing_percent = FALSE,
             labels = list(extent = "Extent")
             )

Characteristic	N1	Control N = 3152	95% CI	N1	Lev+5FU N = 3042	95% CI	p-value3
Extent	285			272			0.37
1		8 (3%)	[1.3%, 5.7%]		9 (3%)	[1.6%, 6.4%]
2		38 (13%)	[9.7%, 18%]		30 (11%)	[7.7%, 16%]
3		222 (78%)	[73%, 82%]		224 (82%)	[77%, 87%]
4		17 (6%)	[3.6%, 9.6%]		9 (3%)	[1.6%, 6.4%]
Missing		30			32
1N without missing values
2n (%)
3Fisher's exact test
Abbreviation: CI = Confidence Interval


summaryTable(data = colon2, 
             group = "rx", 
             vars = "extent", 
             test = TRUE,
             ci = TRUE,
             missing_percent = TRUE,
             labels = list(extent = "Extent")
             )

Characteristic	N1	Control N = 3152	95% CI	N1	Lev+5FU N = 3042	95% CI	p-value3
Extent	285			272			0.37
1		8 (3%)	[1.2%, 5.1%]		9 (3%)	[1.5%, 5.7%]
2		38 (12%)	[8.8%, 16%]		30 (10%)	[6.9%, 14%]
3		222 (70%)	[65%, 75%]		224 (74%)	[68%, 78%]
4		17 (5%)	[3.3%, 8.7%]		9 (3%)	[1.5%, 5.7%]
Missing		30 (10%)	[6.6%, 13%]		32 (11%)	[7.4%, 15%]
1N without missing values
2n (%)
3Fisher's exact test
Abbreviation: CI = Confidence Interval

The tables with and without missing values can also be put next to each other by setting missing to "both".

summaryTable(data = colon2, 
             group = "rx", 
             vars = "extent", 
             missing_percent = "both", 
             test = TRUE,
              labels = list(extent = "Extent")
             )

	With missing		Without missing
Characteristic	Control N = 3151	Lev+5FU N = 3041	Control N = 3151	Lev+5FU N = 3041	p-value2
Extent					0.37
1	8 (3%)	9 (3%)	8 (3%)	9 (3%)
2	38 (12%)	30 (10%)	38 (13%)	30 (11%)
3	222 (70%)	224 (74%)	222 (78%)	224 (82%)
4	17 (5%)	9 (3%)	17 (6%)	9 (3%)
Missing	30 (10%)	32 (11%)
1n (%)
2Fisher's exact test