Using percent_pattern

When working with categorical variables, crosstable() allows a very flexible output thanks to the percent_pattern argument.

This vignette will review the many things you can do using percent_pattern.

initialization

First, let’s add some missing values to the mtcars2 dataset and tweak some options:

library(crosstable)
mtcars3 = mtcars2
mtcars3$cyl[1:5] = NA
mtcars3$vs[5:12] = NA

crosstable_options(
  percent_digits=0
)

Default behaviour

By default, crosstable() will use percent_pattern="{n} ({p_row})", so it outputs the size n along with the row’s percentage p_row:

crosstable(mtcars3, cyl, by=vs) %>% as_flextable()

label

variable

Engine

straight

vshaped

NA

Number of cylinders

4

7 (88%)

1 (12%)

2

6

0 (0%)

1 (100%)

3

8

0 (0%)

11 (100%)

2

NA

2

2

1

Here, we will see how we can tweak percent_pattern in order to display other figures.

NOTE: Missing values will always be described with n alone. If you want to describe them as non-missing values, you will have to mutate them as one, most likely using forcats::fct_explicit_na().

Allowed variables

First, here is the list of all the internal variables you can use:

Should you ever need it, note that it is also possible to use any external variable defined outside of crosstable().

Here is a simple example:

crosstable(mtcars3, cyl, by=vs, 
           percent_pattern="N={n}/{n_row} -> p={p_row}") %>% 
  as_flextable()

label

variable

Engine

straight

vshaped

NA

Number of cylinders

4

N=7/8 -> p=88%

N=1/8 -> p=12%

2

6

N=0/1 -> p=0%

N=1/1 -> p=100%

3

8

N=0/11 -> p=0%

N=11/11 -> p=100%

2

NA

2

2

1

Missing values

As you can see, these internal variables do not account for missing values (except for n, obviously).

This should make sense in most cases, but if it doesn’t, you can use the following variables to account for NA explicitly:

(See the last section for an example)

Note that if you use showNA="no", there will be no difference between the standard variables and the _na variables.

Proportions in totals

As you may have noticed, totals are considered separately:

crosstable(mtcars3, cyl, by=vs, total=TRUE, 
           percent_pattern="N={n}, p={p_row} ({n}/{n_row})") %>% 
  as_flextable()

label

variable

Engine

Total

straight

vshaped

NA

Number of cylinders

4

N=7, p=88% (7/8)

N=1, p=12% (1/8)

2

10 (37%)

6

N=0, p=0% (0/1)

N=1, p=100% (1/1)

3

4 (15%)

8

N=0, p=0% (0/11)

N=11, p=100% (11/11)

2

13 (48%)

NA

2

2

1

5

Total

9 (38%)

15 (62%)

8

32 (100%)

Indeed, you cannot have the same pattern for totals. For instance, the proportion relative to the row would not make sense in the context of the entire row itself.

To get control over the percent_pattern in totals, you have to pass a list with names body, total_row, total_col, and total_all:

pp = list(body="N={n}, p={p_tot} ({n}/{n_tot})", 
          total_row="N={n} p=({p_col})", 
          total_col="{n}", total_all="Total={n}")
crosstable(mtcars3, cyl, by=vs, total=TRUE, 
           percent_pattern=pp) %>% 
  as_flextable()

label

variable

Engine

Total

straight

vshaped

NA

Number of cylinders

4

N=7, p=35% (7/20)

N=1, p=5% (1/20)

2

10

6

N=0, p=0% (0/20)

N=1, p=5% (1/20)

3

4

8

N=0, p=0% (0/20)

N=11, p=55% (11/20)

2

13

NA

2

2

1

5

Total

N=9 p=(38%)

N=15 p=(62%)

8

Total=32

get_percent_pattern()

To easily get a percent_pattern list, you can use the get_percent_pattern() helper:

get_percent_pattern("all")
#> $body
#> [1] "{n} ({p_tot} / {p_row} / {p_col})"
#> 
#> $total_row
#> [1] "{n} ({p_col})"
#> 
#> $total_col
#> [1] "{n} ({p_row})"
#> 
#> $total_all
#> [1] "{n} ({p_tot})"
get_percent_pattern("col", na=TRUE)
#> $body
#> [1] "{n} ({p_col_na})"
#> 
#> $total_row
#> [1] "{n} ({p_col_na})"
#> 
#> $total_col
#> [1] "{n} ({p_row_na})"
#> 
#> $total_all
#> [1] "{n} ({p_tot_na})"

You can also set the result to a variable and modify its members at will. See ?get_percent_pattern for more information.

Ultimate example

Here is the ultimate example for percent_pattern. Give a close look to all possible values and you will surely find the one that you need.

ULTIMATE_PATTERN=list(
  body="N={n}
        Cell: p = {p_tot} ({n}/{n_tot}) [{p_tot_inf}; {p_tot_sup}]
        Col: p = {p_col} ({n}/{n_col}) [{p_col_inf}; {p_col_sup}]
        Row: p = {p_row} ({n}/{n_row}) [{p_row_inf}; {p_row_sup}]
        
        Cell (NA): p = {p_tot_na} ({n}/{n_tot_na}) [{p_tot_na_inf}; {p_tot_na_sup}]
        Col (NA): p = {p_col_na} ({n}/{n_col_na}) [{p_col_na_inf}; {p_col_na_sup}]
        Row (NA): p = {p_row_na} ({n}/{n_row_na}) [{p_row_na_inf}; {p_row_na_sup}]",
  total_row="N={n}
             Row: p = {p_row} ({n}/{n_row}) [{p_row_inf}; {p_row_sup}]
             Row (NA): p = {p_row_na} ({n}/{n_row_na}) [{p_row_na_inf}; {p_row_na_sup}]",
  total_col="N={n}
             Col: p = {p_col} ({n}/{n_col}) [{p_col_inf}; {p_col_sup}]
             Col (NA): p = {p_col_na} ({n}/{n_col_na}) [{p_col_na_inf}; {p_col_na_sup}]",
  total_all="N={n}
             P: {p_col} [{p_col_inf}; {p_col_sup}]
             P (NA): {p_col} [{p_col_na_inf}; {p_col_na_sup}]"
)

crosstable(mtcars3, cyl, by=vs,
           percent_digits=0, total=TRUE, showNA="always",
           percent_pattern=ULTIMATE_PATTERN) %>% 
  as_flextable() %>% 
  flextable::theme_box()

label

variable

Engine

Total

straight

vshaped

NA

Number of cylinders

4

N=7
Cell: p = 35% (7/20) [2e+01%; 57%]
Col: p = 100% (7/7) [65%; 100%]
Row: p = 88% (7/8) [53%; 98%]

Cell (NA): p = 22% (7/32) [11%; 39%]
Col (NA): p = 78% (7/9) [45%; 94%]
Row (NA): p = 70% (7/10) [40%; 89%]

N=1
Cell: p = 5% (1/20) [9e-01%; 24%]
Col: p = 8% (1/13) [1%; 33%]
Row: p = 12% (1/8) [2%; 47%]

Cell (NA): p = 3% (1/32) [1%; 16%]
Col (NA): p = 7% (1/15) [1%; 30%]
Row (NA): p = 10% (1/10) [2%; 40%]

2

N=10
Col: p = 37% (10/27) [22%; 56%]
Col (NA): p = 31% (10/32) [18%; 49%]

6

N=0
Cell: p = 0% (0/20) [1e-15%; 16%]
Col: p = 0% (0/7) [0%; 35%]
Row: p = 0% (0/1) [0%; 79%]

Cell (NA): p = 0% (0/32) [0%; 11%]
Col (NA): p = 0% (0/9) [0%; 30%]
Row (NA): p = 0% (0/4) [0%; 49%]

N=1
Cell: p = 5% (1/20) [9e-01%; 24%]
Col: p = 8% (1/13) [1%; 33%]
Row: p = 100% (1/1) [21%; 100%]

Cell (NA): p = 3% (1/32) [1%; 16%]
Col (NA): p = 7% (1/15) [1%; 30%]
Row (NA): p = 25% (1/4) [5%; 70%]

3

N=4
Col: p = 15% (4/27) [6%; 32%]
Col (NA): p = 12% (4/32) [5%; 28%]

8

N=0
Cell: p = 0% (0/20) [1e-15%; 16%]
Col: p = 0% (0/7) [0%; 35%]
Row: p = 0% (0/11) [0%; 26%]

Cell (NA): p = 0% (0/32) [0%; 11%]
Col (NA): p = 0% (0/9) [0%; 30%]
Row (NA): p = 0% (0/13) [0%; 23%]

N=11
Cell: p = 55% (11/20) [3e+01%; 74%]
Col: p = 85% (11/13) [58%; 96%]
Row: p = 100% (11/11) [74%; 100%]

Cell (NA): p = 34% (11/32) [20%; 52%]
Col (NA): p = 73% (11/15) [48%; 89%]
Row (NA): p = 85% (11/13) [58%; 96%]

2

N=13
Col: p = 48% (13/27) [31%; 66%]
Col (NA): p = 41% (13/32) [26%; 58%]

NA

2

2

1

5

Total

N=9
Row: p = 38% (9/24) [21%; 57%]
Row (NA): p = 28% (9/32) [16%; 45%]

N=15
Row: p = 62% (15/24) [43%; 79%]
Row (NA): p = 47% (15/32) [31%; 64%]

8

N=32
P: 100% [89%; 100%]
P (NA): 100% [89%; 100%]