This vignette demonstrates the main function of the furniture
package–table1
. The main parts of the package are below:
table1(.data, ..., splitby, row_wise, test, type, output, format_number, NAkeep, splitby_labels, var_names)
It contains several useful features for summarizing your data:
medians
option, you can obtain the median and the first quartile/third quantile.knitr::kable
).output
, format_output
, simple
and condense
.export = "file_name"
.To illustrate, we’ll walk through the main arguments with an example on some ficticious data.
set.seed(84332)
## Create Ficticious Data containing several types of variables
df <- data.frame(a = sample(1:10000, 10000, replace = TRUE),
b = runif(10000) + rnorm(10000),
c = factor(sample(c(1,2,3,4,NA), 10000, replace=TRUE)),
d = factor(sample(c(0,1,NA), 10000, replace=TRUE)),
e = trunc(rnorm(10000, 20, 5)),
f = factor(sample(c(0,1,NA), 10000, replace=TRUE)))
We will use df
to show these main features of table1
.
For table1
, the ellipses (the ...
), are the variables to be summarized that are found in your data. Here, we have a
through e
in df
.
table1(df,
a, b, c, d, e)
##
## |==============================|
## Mean/Count (SD/%)
## Observations 10000
## a
## 4971.4 (2890.5)
## b
## 0.5 (1.0)
## c
## 1 1962 (24.5%)
## 2 2064 (25.8%)
## 3 1945 (24.3%)
## 4 2023 (25.3%)
## d
## 0 3295 (49.8%)
## 1 3315 (50.2%)
## e
## 19.4 (5.0)
## |==============================|
To get means/count and SD’s/percentages by a stratifying variable, simply use the splitby
argument. The splitby can be a quoted variable (e.g., "df"
) or can be a one-sided formula as shown below (e.g., ~d
).
table1(df,
a, b, c,
splitby = ~d)
##
## |============================================|
## d
## 0 1
## Observations 3295 3315
## a
## 4949.8 (2868.8) 4969.0 (2884.4)
## b
## 0.5 (1.0) 0.5 (1.0)
## c
## 1 665 (25.3%) 639 (24.1%)
## 2 652 (24.8%) 677 (25.6%)
## 3 645 (24.6%) 622 (23.5%)
## 4 663 (25.3%) 709 (26.8%)
## |============================================|
You can get percentages by rows instead of by columns (i.e., groups) by using the row_wise = TRUE
option.
table1(df,
a, b, c,
splitby = ~d,
row_wise = TRUE)
##
## |============================================|
## d
## 0 1
## Observations 3295 3315
## a
## 4949.8 (2868.8) 4969.0 (2884.4)
## b
## 0.5 (1.0) 0.5 (1.0)
## c
## 1 665 (33.9%) 639 (32.6%)
## 2 652 (31.6%) 677 (32.8%)
## 3 645 (33.2%) 622 (32%)
## 4 663 (32.8%) 709 (35%)
## |============================================|
It is easy to test for bivariate relationships, as in common in many Table 1’s, using test = TRUE
.
table1(df,
a, b, c,
splitby = ~d,
test = TRUE)
##
## |====================================================|
## d
## 0 1 P-Value
## Observations 3295 3315
## a 0.786
## 4949.8 (2868.8) 4969.0 (2884.4)
## b 0.112
## 0.5 (1.0) 0.5 (1.0)
## c 0.414
## 1 665 (25.3%) 639 (24.1%)
## 2 652 (24.8%) 677 (25.6%)
## 3 645 (24.6%) 622 (23.5%)
## 4 663 (25.3%) 709 (26.8%)
## |====================================================|
By default, only the p-values are shown but other options exist such as stars or including the test statistics with the p-values using the format_output
argument.
The table can be simplified by just producing percentages for categorical variables. Further, it can be condensed by providing only a reference group’s percentages for binary variables and the means and SD’s are provided on the same line as the variable name.
table1(df,
f, a, b, c,
splitby = ~d,
test = TRUE,
type = c("simple", "condensed"))
##
## |====================================================|
## d
## 0 1 P-Value
## Observations 3295 3315
## f: 1 50.4% 50.1% 0.857
## a 4949.8 (2868.8) 4969.0 (2884.4) 0.786
## b 0.5 (1.0) 0.5 (1.0) 0.112
## c 0.414
## 1 25.3% 24.1%
## 2 24.8% 25.6%
## 3 24.6% 23.5%
## 4 25.3% 26.8%
## |====================================================|
If the medians and the interquartile range is desired instead of means and SD’s, simply use the second
argument:
table1(df,
f, a, b, c,
splitby = ~d,
test = TRUE,
type = c("simple", "condensed"),
second = c("a", "b"))
##
## |====================================================|
## d
## 0 1 P-Value
## Observations 3295 3315
## f: 1 50.4% 50.1% 0.857
## a 4958.0 [4902.0] 5031.0 [4937.5] 0.786
## b 0.5 [1.4] 0.5 [1.4] 0.112
## c 0.414
## 1 25.3% 24.1%
## 2 24.8% 25.6%
## 3 24.6% 23.5%
## 4 25.3% 26.8%
## |====================================================|
Several output types exist for the table (all of the knitr::kable
options) including html
as shown below. Others include:
table1(df,
a, b, c,
splitby = ~d,
test = TRUE,
output = "html")
0 | 1 | P-Value | |
---|---|---|---|
Observations | 3295 | 3315 | |
a | 0.786 | ||
4949.8 (2868.8) | 4969.0 (2884.4) | ||
b | 0.112 | ||
0.5 (1.0) | 0.5 (1.0) | ||
c | 0.414 | ||
– 1 – | 665 (25.3%) | 639 (24.1%) | |
– 2 – | 652 (24.8%) | 677 (25.6%) | |
– 3 – | 645 (24.6%) | 622 (23.5%) | |
– 4 – | 663 (25.3%) | 709 (26.8%) |
For some papers you may want to format the numbers by inserting a comma in as a placeholder in big numbers (e.g., 30,000 vs. 30000). You can do this by using format_number = TRUE
.
table1(df,
a, b, c,
splitby = ~d,
test = TRUE,
format_number = TRUE)
##
## |========================================================|
## d
## 0 1 P-Value
## Observations 3295 3315
## a 0.786
## 4,949.8 (2,868.8) 4,969.0 (2,884.4)
## b 0.112
## 0.5 (1.0) 0.5 (1.0)
## c 0.414
## 1 665 (25.3%) 639 (24.1%)
## 2 652 (24.8%) 677 (25.6%)
## 3 645 (24.6%) 622 (23.5%)
## 4 663 (25.3%) 709 (26.8%)
## |========================================================|
In order to explore the missingness in the factor variables, using NAkeep = TRUE
does the counts and percentages of the missing values as well.
table1(df,
a, b, c,
splitby = ~d,
test = TRUE,
NAkeep = TRUE)
##
## |====================================================|
## d
## 0 1 P-Value
## Observations 3295 3315
## a 0.786
## 4949.8 (2868.8) 4969.0 (2884.4)
## b 0.112
## 0.5 (1.0) 0.5 (1.0)
## c 0.414
## 1 665 (20.2%) 639 (19.3%)
## 2 652 (19.8%) 677 (20.4%)
## 3 645 (19.6%) 622 (18.8%)
## 4 663 (20.1%) 709 (21.4%)
## NA 670 (20.3%) 668 (20.2%)
## |====================================================|
Here we do not have any missingness but it shows up as zeros to show that there are none there.
Finally, to make it easier to implement in the tidyverse of packages, a piping option is available. This option invisibly returns the data frame that was given to the table 1 function and prints the table in console.
library(tidyverse)
df %>%
filter(f == 1) %>%
na.omit %>%
table1(a, b, c,
splitby = ~d,
test = TRUE,
type = c("simple", "condensed")) %>%
ggplot(aes(x = b, y = a, group = d)) +
geom_point(aes(color = d), alpha =.25) +
geom_smooth(aes(color = d), method = "lm", se=FALSE) +
scale_color_manual(values = c("dodgerblue3", "chartreuse4"), name = "Group") +
theme_bw()
##
## |====================================================|
## d
## 0 1 P-Value
## Observations 879 903
## a 4964.1 (2872.3) 4968.2 (2922.8) 0.976
## b 0.5 (1.0) 0.5 (1.1) 0.556
## c 0.485
## 1 25.1% 24.9%
## 2 24.6% 23.4%
## 3 23.5% 21.8%
## 4 26.7% 29.9%
## |====================================================|
The var_names
argument lets you rename the variables.
table1(df,
a, b, c,
splitby = ~d,
test = TRUE,
var_names = c("A", "B", "C"))
##
## |====================================================|
## d
## 0 1 P-Value
## Observations 3295 3315
## A 0.786
## 4949.8 (2868.8) 4969.0 (2884.4)
## B 0.112
## 0.5 (1.0) 0.5 (1.0)
## C 0.414
## 1 665 (25.3%) 639 (24.1%)
## 2 652 (24.8%) 677 (25.6%)
## 3 645 (24.6%) 622 (23.5%)
## 4 663 (25.3%) 709 (26.8%)
## |====================================================|
This is particularly useful when you adjust a variable within the function:
table1(df,
factor(ifelse(a > 1, 1, 0)), b, c,
splitby = ~d,
test = TRUE,
var_names = c("A", "B", "C"))
##
## |============================================|
## d
## 0 1 P-Value
## Observations 3295 3315
## A 1
## 0 0 (0%) 1 (0%)
## 1 3295 (100%) 3314 (100%)
## B 0.112
## 0.5 (1.0) 0.5 (1.0)
## C 0.414
## 1 665 (25.3%) 639 (24.1%)
## 2 652 (24.8%) 677 (25.6%)
## 3 645 (24.6%) 622 (23.5%)
## 4 663 (25.3%) 709 (26.8%)
## |============================================|
Here we changed a
to a factor within the function. In order for the name to look better, we can use the var_names
argument, otherwise it would be named something like factor.ifelse.a...
.
As a final note, the "table1"
object can be coerced to a data.frame
very easily:
tab1 = table1(df,
a, b, c,
splitby = ~d,
test = TRUE)
data.frame(tab1)
## Table1.. Table1.0 Table1.1 Table1.P.Value Splitby
## 1 Observations 3295 3315 d
## 2 a 0.786 d
## 3 4949.8 (2868.8) 4969.0 (2884.4) d
## 4 b 0.112 d
## 5 0.5 (1.0) 0.5 (1.0) d
## 6 c 0.414 d
## 7 1 665 (25.3%) 639 (24.1%) d
## 8 2 652 (24.8%) 677 (25.6%) d
## 9 3 645 (24.6%) 622 (23.5%) d
## 10 4 663 (25.3%) 709 (26.8%) d
table1
can be a valuable addition to the tools that are being utilized to analyze descriptive statistics. Enjoy this valuable piece of furniture!