Once you have built your full specification blueprint and feel comfortable with how the pipeline is executed, you can implement a full multiverse-style analysis.
Simply use
run_multiverse(<your expanded grid object>)
:
library(tidyverse)
library(multitool)
# create some data
the_data <-
data.frame(
id = 1:500,
iv1 = rnorm(500),
iv2 = rnorm(500),
iv3 = rnorm(500),
mod = rnorm(500),
dv1 = rnorm(500),
dv2 = rnorm(500),
include1 = rbinom(500, size = 1, prob = .1),
include2 = sample(1:3, size = 500, replace = TRUE),
include3 = rnorm(500)
)
# create a pipeline blueprint
full_pipeline <-
the_data |>
add_filters(include1 == 0, include2 != 3, include3 > -2.5) |>
add_variables(var_group = "ivs", iv1, iv2, iv3) |>
add_variables(var_group = "dvs", dv1, dv2) |>
add_model("linear model", lm({dvs} ~ {ivs} * mod))
# expand the pipeline
expanded_pipeline <- expand_decisions(full_pipeline)
# Run the multiverse
multiverse_results <- run_multiverse(expanded_pipeline)
multiverse_results
#> # A tibble: 48 × 4
#> decision specifications model_fitted pipeline_code
#> <chr> <list> <list> <list>
#> 1 1 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 2 2 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 3 3 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 4 4 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 5 5 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 6 6 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 7 7 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 8 8 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 9 9 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 10 10 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> # ℹ 38 more rows
The result will be another tibble
with various list
columns.
It will always contain a list column named
specifications
containing all the information you generated
in your blueprint. Next, there will a list column for your fitted model
fitted, labelled model_fitted
.
There are two main ways to unpack and examine multitool
results. The first is by using tidyr::unnest()
.
Inside the model_fitted
column, multitool
gives us 4 columns: model_parameters
,
model_performance
, model_warnings
, and
model_messages
.
multiverse_results |> unnest(model_fitted)
#> # A tibble: 48 × 8
#> decision specifications model_function model_parameters model_performance
#> <chr> <list> <chr> <list> <list>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> # ℹ 38 more rows
#> # ℹ 3 more variables: model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>
The model_parameters
column gives you the result of
calling parameters::parameters()
on each model in your
grid, which is a data.frame
of model coefficients and their
associated standard errors, confidence intervals, test statistic, and a
p-values.
multiverse_results |>
unnest(model_fitted) |>
unnest(model_parameters)
#> # A tibble: 192 × 16
#> decision specifications model_function parameter coefficient se ci
#> <chr> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Intercept) -0.0437 0.0569 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 -0.0531 0.0574 0.95
#> 3 1 <tibble [1 × 3]> lm mod -0.00841 0.0605 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod -0.0625 0.0593 0.95
#> 5 2 <tibble [1 × 3]> lm (Intercept) -0.0565 0.0565 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 -0.0696 0.0570 0.95
#> 7 2 <tibble [1 × 3]> lm mod -0.0478 0.0600 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.124 0.0588 0.95
#> 9 3 <tibble [1 × 3]> lm (Intercept) -0.0491 0.0568 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 0.0475 0.0628 0.95
#> # ℹ 182 more rows
#> # ℹ 9 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> # p <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
The model_performance
column gives fit statistics, such
as r2 or AIC and BIC values, computed by running
performance::performance()
on each model in your grid.
multiverse_results |>
unnest(model_fitted) |>
unnest(model_performance)
#> # A tibble: 48 × 14
#> decision specifications model_function model_parameters aic aicc bic
#> <chr> <list> <chr> <list> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 899. 900. 918.
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 894. 895. 913.
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 900. 900. 918.
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 898. 899. 917.
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 899. 899. 918.
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 900. 900. 918.
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 910. 910. 928.
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 909. 909. 928.
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 910. 910. 928.
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 912. 912. 931.
#> # ℹ 38 more rows
#> # ℹ 7 more variables: r2 <dbl>, r2_adjusted <dbl>, rmse <dbl>, sigma <dbl>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>
The model_messages
and model_warnings
columns contain information provided by the modeling function. If
something went wrong or you need to know something about a particular
model, these columns will have captured messages and warnings printed by
the modeling function.
I wrote wrappers around the tidyr::unnest()
workflow.
The main function is reveal()
. Pass a multiverse results
object to reveal()
and tell it which columns to grab by
indicating the column name in the .what
argument:
multiverse_results |>
reveal(.what = model_fitted)
#> # A tibble: 48 × 8
#> decision specifications model_function model_parameters model_performance
#> <chr> <list> <chr> <list> <list>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> # ℹ 38 more rows
#> # ℹ 3 more variables: model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>
If you want to get straight to a specific result you can specify a
sub-list with the .which
argument:
multiverse_results |>
reveal(.what = model_fitted, .which = model_parameters)
#> # A tibble: 192 × 16
#> decision specifications model_function parameter coefficient se ci
#> <chr> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Intercept) -0.0437 0.0569 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 -0.0531 0.0574 0.95
#> 3 1 <tibble [1 × 3]> lm mod -0.00841 0.0605 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod -0.0625 0.0593 0.95
#> 5 2 <tibble [1 × 3]> lm (Intercept) -0.0565 0.0565 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 -0.0696 0.0570 0.95
#> 7 2 <tibble [1 × 3]> lm mod -0.0478 0.0600 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.124 0.0588 0.95
#> 9 3 <tibble [1 × 3]> lm (Intercept) -0.0491 0.0568 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 0.0475 0.0628 0.95
#> # ℹ 182 more rows
#> # ℹ 9 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> # p <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
reveal_model_*
multitool
will run and save anything you put in your
pipeline but most often, you will want to look at model parameters
and/or performance. To that end, there are a set of convenience
functions for getting at the most common multiverse results:
reveal_model_parameters
,
reveal_model_performance
,
reveal_model_messages
, and
reveal_model_warnings
.
reveal_model_parameters
unpacks the model parameters in
your multiverse:
multiverse_results |>
reveal_model_parameters()
#> # A tibble: 192 × 16
#> decision specifications model_function parameter coefficient se ci
#> <chr> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Intercept) -0.0437 0.0569 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 -0.0531 0.0574 0.95
#> 3 1 <tibble [1 × 3]> lm mod -0.00841 0.0605 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod -0.0625 0.0593 0.95
#> 5 2 <tibble [1 × 3]> lm (Intercept) -0.0565 0.0565 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 -0.0696 0.0570 0.95
#> 7 2 <tibble [1 × 3]> lm mod -0.0478 0.0600 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.124 0.0588 0.95
#> 9 3 <tibble [1 × 3]> lm (Intercept) -0.0491 0.0568 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 0.0475 0.0628 0.95
#> # ℹ 182 more rows
#> # ℹ 9 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> # p <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
reveal_model_performance
unpacks the model
performance:
multiverse_results |>
reveal_model_performance()
#> # A tibble: 48 × 14
#> decision specifications model_function model_parameters aic aicc bic
#> <chr> <list> <chr> <list> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 899. 900. 918.
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 894. 895. 913.
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 900. 900. 918.
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 898. 899. 917.
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 899. 899. 918.
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 900. 900. 918.
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 910. 910. 928.
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 909. 909. 928.
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 910. 910. 928.
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 912. 912. 931.
#> # ℹ 38 more rows
#> # ℹ 7 more variables: r2 <dbl>, r2_adjusted <dbl>, rmse <dbl>, sigma <dbl>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>
You can also choose to expand your decision grid with
.unpack_specs
to see which decisions produced what result.
You have two options for unpacking your decisions - wide
or
long
. If you set .unpack_specs = 'wide'
, you
get one column per decion variable. This is exactly the same as how your
decisions appeared in your grid.
multiverse_results |>
reveal_model_parameters(.unpack_specs = "wide")
#> # A tibble: 192 × 22
#> decision ivs dvs include1 include2 include3 model model_meta
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 iv1 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 2 1 iv1 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 3 1 iv1 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 4 1 iv1 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 5 2 iv1 dv2 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 6 2 iv1 dv2 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 7 2 iv1 dv2 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 8 2 iv1 dv2 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 9 3 iv2 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 10 3 iv2 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> # ℹ 182 more rows
#> # ℹ 14 more variables: model_function <chr>, parameter <chr>,
#> # coefficient <dbl>, se <dbl>, ci <dbl>, ci_low <dbl>, ci_high <dbl>,
#> # t <dbl>, df_error <int>, p <dbl>, model_performance <list>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>
If you set .unpack_specs = 'long'
, your decisions get
stacked into two columns: decision_set
and
alternatives
. This format is nice for plotting a particular
result from a multiverse analyses per different decision
alternatives.
multiverse_results |>
reveal_model_performance(.unpack_specs = "long")
#> # A tibble: 288 × 15
#> decision decision_set alternatives model_function model_parameters aic
#> <chr> <chr> <chr> <chr> <list> <dbl>
#> 1 1 ivs iv1 lm <prmtrs_m [4 × 9]> 899.
#> 2 1 dvs dv1 lm <prmtrs_m [4 × 9]> 899.
#> 3 1 include1 include1 == 0 lm <prmtrs_m [4 × 9]> 899.
#> 4 1 include2 include2 != 3 lm <prmtrs_m [4 × 9]> 899.
#> 5 1 include3 include3 > -2.5 lm <prmtrs_m [4 × 9]> 899.
#> 6 1 model linear model lm <prmtrs_m [4 × 9]> 899.
#> 7 2 ivs iv1 lm <prmtrs_m [4 × 9]> 894.
#> 8 2 dvs dv2 lm <prmtrs_m [4 × 9]> 894.
#> 9 2 include1 include1 == 0 lm <prmtrs_m [4 × 9]> 894.
#> 10 2 include2 include2 != 3 lm <prmtrs_m [4 × 9]> 894.
#> # ℹ 278 more rows
#> # ℹ 9 more variables: aicc <dbl>, bic <dbl>, r2 <dbl>, r2_adjusted <dbl>,
#> # rmse <dbl>, sigma <dbl>, model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>
Unpacking specifications alongside specific results allows us to examine the effects of our pipeline decisions.
A powerful way to organize these results is to summarize a specific
results column, say the r2 values of our model over the
entire multiverse. condense()
takes a result column and
summarizes it with the .how
argument, which takes a list in
the form of
list(<a name you pick> = <summary function>)
.
.how
will create a column named like so
<column being condsensed>_<summary function name provided>"
.
For this case, we have
r2
_meanand
r2_median`.
# model performance r2 summaries
multiverse_results |>
reveal_model_performance() |>
condense(r2, list(mean = mean, median = median))
#> # A tibble: 1 × 3
#> r2_mean r2_median r2_list
#> <dbl> <dbl> <list>
#> 1 0.00743 0.00672 <dbl [48]>
# model parameters for our predictor of interest
multiverse_results |>
reveal_model_parameters() |>
filter(str_detect(parameter, "iv")) |>
condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 1 × 3
#> coefficient_mean coefficient_median coefficient_list
#> <dbl> <dbl> <list>
#> 1 -0.0219 -0.0241 <dbl [96]>
In the last example, we have filtered our multiverse results to look
at our predictors iv*
to see what the mean and median
effect was (over all combinations of decisions) on our outcomes.
However, we had three versions of our predictor and two outcomes, so
combining dplyr::group_by()
with condense()
might be more informative:
multiverse_results |>
reveal_model_parameters(.unpack_specs = "wide") |>
filter(str_detect(parameter, "iv")) |>
group_by(ivs, dvs) |>
condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 6 × 5
#> # Groups: ivs [3]
#> ivs dvs coefficient_mean coefficient_median coefficient_list
#> <chr> <chr> <dbl> <dbl> <list>
#> 1 iv1 dv1 -0.0638 -0.0578 <dbl [16]>
#> 2 iv1 dv2 -0.0721 -0.0703 <dbl [16]>
#> 3 iv2 dv1 -0.0267 -0.0386 <dbl [16]>
#> 4 iv2 dv2 0.0172 0.0243 <dbl [16]>
#> 5 iv3 dv1 0.00778 0.0191 <dbl [16]>
#> 6 iv3 dv2 0.00639 0.000868 <dbl [16]>
If we were interested in all the terms of the model, we can leverage
group_by
further:
multiverse_results |>
reveal_model_parameters(.unpack_specs = "wide") |>
group_by(parameter, dvs) |>
condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 16 × 5
#> # Groups: parameter [8]
#> parameter dvs coefficient_mean coefficient_median coefficient_list
#> <chr> <chr> <dbl> <dbl> <list>
#> 1 (Intercept) dv1 -0.0262 -0.0292 <dbl [24]>
#> 2 (Intercept) dv2 -0.0515 -0.0512 <dbl [24]>
#> 3 iv1 dv1 -0.0465 -0.0465 <dbl [8]>
#> 4 iv1 dv2 -0.0659 -0.0647 <dbl [8]>
#> 5 iv1:mod dv1 -0.0812 -0.0848 <dbl [8]>
#> 6 iv1:mod dv2 -0.0782 -0.0839 <dbl [8]>
#> 7 iv2 dv1 0.0114 0.00699 <dbl [8]>
#> 8 iv2 dv2 -0.0282 -0.0275 <dbl [8]>
#> 9 iv2:mod dv1 -0.0649 -0.0634 <dbl [8]>
#> 10 iv2:mod dv2 0.0626 0.0598 <dbl [8]>
#> 11 iv3 dv1 0.0446 0.0431 <dbl [8]>
#> 12 iv3 dv2 0.0174 0.0199 <dbl [8]>
#> 13 iv3:mod dv1 -0.0290 -0.0249 <dbl [8]>
#> 14 iv3:mod dv2 -0.00463 -0.00132 <dbl [8]>
#> 15 mod dv1 0.00372 0.00552 <dbl [24]>
#> 16 mod dv2 -0.0566 -0.0568 <dbl [24]>