| Title: | Analysis Blinding Tools |
| Version: | 1.0.0 |
| Description: | Provides tools for analysis blinding in confirmatory research contexts by masking and scrambling test-relevant aspects of data. Vector-, data frame-, and row-wise operations support blinding for hierarchical and repeated-measures designs. For more details see MacCoun and Perlmutter (2015) <doi:10.1038/526187a> and Dutilh, Sarafoglou, and Wagenmakers (2019) <doi:10.1007/s11229-019-02456-7>. |
| License: | MIT + file LICENSE |
| URL: | https://nthun.github.io/vazul/ |
| BugReports: | https://github.com/nthun/vazul/issues |
| Depends: | R (≥ 4.1.0) |
| Imports: | dplyr, lifecycle, rlang, tidyselect, stats |
| Suggests: | testthat, covr, knitr, quarto |
| VignetteBuilder: | quarto |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-01-24 21:34:30 UTC; thoma |
| Author: | Tamás Nagy |
| Maintainer: | Tamás Nagy <nagytamas.hungary@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-01-28 18:50:02 UTC |
vazul: Analysis Blinding Tools
Description
Provides tools for analysis blinding in confirmatory research contexts by masking and scrambling test-relevant aspects of data. Vector-, data frame-, and row-wise operations support blinding for hierarchical and repeated-measures designs. For more details see MacCoun and Perlmutter (2015) doi:10.1038/526187a and Dutilh, Sarafoglou, and Wagenmakers (2019) doi:10.1007/s11229-019-02456-7.
Author(s)
Maintainer: Tamás Nagy nagytamas.hungary@gmail.com (ORCID)
Authors:
Alexandra Sarafoglou alexandra.sarafoglou@gmail.com (ORCID) [data contributor]
Márton Kovács marton.balazs.kovacs@gmail.com (ORCID)
See Also
Useful links:
MARP: Many Analysts Religion Project Dataset
Description
A cross-cultural dataset from the Many-Analysts Religion Project (MARP), which investigated the relationship between religiosity and well-being across 24 countries and diverse religious traditions.
Usage
data(marp)
Format
A data frame with 10,535 rows (participants) and 48 variables:
- subject
Unique subject identifier (integer).
- country
Country of residence (character string).
- rel_1
Importance of religion in daily life (0–10 scale).
- rel_2
Frequency of religious service attendance (ordinal).
- rel_3
Self-rated religiosity (0–10 scale).
- rel_4
Belief in God (binary: yes/no).
- rel_5
Prayer frequency (ordinal).
- rel_6
Bible/study frequency (ordinal).
- rel_7
Religious upbringing (binary: yes/no).
- rel_8
Current religious denomination (categorical).
- rel_9
Change in religiosity over lifetime (ordinal).
- cnorm_1
Perceived cultural norm: importance of religious lifestyle for average person in country (0–10).
- cnorm_2
Perceived cultural norm: importance of belief in God for average person in country (0–10).
- wb_gen_1
Overall life satisfaction (1–5 Likert).
- wb_gen_2
Overall happiness (1–5 Likert).
- wb_phys_1
Energy level (1–5).
- wb_phys_2
Sleep quality (1–5).
- wb_phys_3
Appetite (1–5).
- wb_phys_4
Physical pain/discomfort (1–5).
- wb_phys_5
General health (1–5).
- wb_phys_6
Exercise frequency (1–5).
- wb_phys_7
Illness burden (1–5).
- wb_psych_1
Positive affect (1–5).
- wb_psych_2
Negative affect (reverse coded; 1–5).
- wb_psych_3
Meaning in life (1–5).
- wb_psych_4
Purpose in life (1–5).
- wb_psych_5
Hopefulness (1–5).
- wb_psych_6
Anxiety (reverse coded; 1–5).
- wb_soc_1
Social support (1–5).
- wb_soc_2
Loneliness (reverse coded; 1–5).
- wb_soc_3
Community belonging (1–5).
- wb_overall_mean
Mean of all well-being items (numeric).
- wb_phys_mean
Mean of physical well-being items (numeric).
- wb_psych_mean
Mean of psychological well-being items (numeric).
- wb_soc_mean
Mean of social well-being items (numeric).
- age
Age in years (integer).
- gender
Self-reported gender (character: e.g., "Male", "Female", "Other").
- ses
Socioeconomic status composite (numeric).
- education
Highest education level completed (ordinal integer).
- ethnicity
Self-reported ethnicity (character).
- denomination
Religious denomination (character).
- gdp
GDP per capita (PPP, USD) for country (numeric).
- gdp_scaled
Scaled GDP (mean = 0, sd = 1) used in analyses (numeric).
- sample_type
Recruitment method: e.g., "online panel", "student sample" (character).
- compensation
Type of compensation: e.g., "monetary", "entry into lottery" (character).
- attention_check
Score on embedded attention check task (integer).
Source
Hoogeveen, S., Sarafoglou, A., Aczel, B., et al. (2022). A many-analysts approach to the relation between religiosity and well-being. Religion, Brain & Behavior. doi:10.1080/2153599X.2023.2254980
Examples
library(dplyr)
data(marp)
# Dimensions
dim(marp)
# Quick overview
if (requireNamespace("dplyr", quietly = TRUE)) {
library(dplyr)
marp |>
group_by(country) |>
summarise(
mean_wb = mean(wb_overall_mean, na.rm = TRUE),
.groups = "drop"
)
}
Mask categorical labels with random labels
Description
Assigns random new labels to each unique value in a character or factor vector. The purpose is to blind data so analysts are not aware of treatment allocation or categorical outcomes. Each unique original value gets a random new label, and the assignment order is randomized to prevent correspondence with the original order.
Usage
mask_labels(x, prefix = "masked_group_")
Arguments
x |
a character or factor vector |
prefix |
character string to use as prefix for masked labels. Default is "masked_group_" |
Value
a vector of the same type as input with masked labels
See Also
mask_variables for masking multiple variables in a data frame,
mask_variables_rowwise for rowwise masking, and
mask_names for masking variable names.
Examples
# Example with character vector
set.seed(123)
treatment <- c("control", "treatment", "control", "treatment")
mask_labels(treatment)
# Example with custom prefix
set.seed(456)
condition <- c("A", "B", "C", "A", "B", "C")
mask_labels(condition, prefix = "group_")
# Example with factor vector
set.seed(789)
ecology <- factor(c("Desperate", "Hopeful", "Desperate", "Hopeful"))
mask_labels(ecology)
# Using with dataset column
data(williams)
set.seed(123)
williams$ecology_masked <- mask_labels(williams$ecology)
head(williams[c("ecology", "ecology_masked")])
Mask variable names with anonymous labels
Description
Assigns new masked names to selected variables in a data frame. All selected variables are combined into a single set and renamed with a common prefix. To mask different variable groups with different prefixes, call the function separately for each group.
Usage
mask_names(data, ..., prefix)
Arguments
data |
A data frame. |
... |
Columns to mask using tidyselect semantics. All arguments are combined into a single set. Each can be:
|
prefix |
character string to use as prefix for masked names.
This becomes the base prefix, with numeric suffixes appended (e.g.,
|
Value
A data frame with the specified variables renamed to masked names.
See Also
mask_labels for masking values in a vector,
mask_variables for masking values in multiple variables, and
mask_variables_rowwise for rowwise value masking.
Examples
df <- data.frame(
treat_1 = c(1, 2, 3),
treat_2 = c(4, 5, 6),
outcome_a = c(7, 8, 9),
outcome_b = c(10, 11, 12),
id = 1:3
)
# Mask one set of variables
library(dplyr)
mask_names(df, starts_with("treat_"), prefix = "A_")
# Using character vectors
mask_names(df, c("treat_1", "treat_2"), prefix = "A_")
# Mask multiple sets separately
# Note that the order of masking matters
# Try to mix up the order of prefixes
# for different sets to ensure proper masking.
df |>
mask_names(starts_with("treat_"), prefix = "B_") |>
mask_names(starts_with("outcome_"), prefix = "A_")
# Example with the 'williams' dataset
data(williams)
set.seed(42)
williams |>
mask_names(starts_with("SexUnres"), prefix = "A_") |>
mask_names(starts_with("Impul"), prefix = "B_") |>
colnames()
Mask categorical variables with random labels in a data frame
Description
Applies masked labels to multiple categorical variables in a data frame using
the mask_labels() function. Each variable gets independent random
masked labels by default, or can optionally use the same masked labels
across all selected variables.
Usage
mask_variables(data, ..., across_variables = FALSE)
Arguments
data |
a data frame |
... |
Columns to mask using tidyselect semantics. Each can be:
Only character and factor columns will be processed. |
across_variables |
logical. If |
Value
A data frame with the specified categorical columns masked. Only character and factor columns can be processed.
See Also
mask_labels for masking a single vector,
mask_variables_rowwise for rowwise masking, and
mask_names for masking variable names.
Examples
# Create example data
df <- data.frame(
treatment = c("control", "intervention", "control"),
outcome = c("success", "failure", "success"),
score = c(1, 2, 3) # numeric, won't be masked
)
set.seed(123)
# Independent masking for each variable (default - uses column names as
# prefixes)
# Using bare names
mask_variables(df, treatment, outcome)
# Or using character vector
mask_variables(df, c("treatment", "outcome"))
set.seed(456)
# Shared masking across variables
mask_variables(df, c("treatment", "outcome"), across_variables = TRUE)
# Using tidyselect helpers
mask_variables(df, where(is.character))
# Example with multiple categorical columns
df2 <- data.frame(
group = c("A", "B", "A", "B"),
condition = c("ctrl", "test", "ctrl", "test")
)
set.seed(123)
result <- mask_variables(df2, c("group", "condition"))
print(result)
# Example with williams dataset (multiple categorical columns)
data(williams)
set.seed(456)
# Using bare names (recommended for interactive use)
williams_masked <- mask_variables(williams, subject, ecology)
head(williams_masked[c("subject", "ecology")])
Mask categorical labels across multiple columns rowwise in a data frame
Description
For each row, independently mask labels within the selected columns. All selected columns are combined into a single set and processed together. To mask different variable groups separately, call the function multiple times.
Usage
mask_variables_rowwise(data, ..., prefix = "masked_group_")
Arguments
data |
A data frame. |
... |
Columns to mask using tidyselect semantics. All arguments are combined into a single set. Each can be:
|
prefix |
character string to use as prefix for masked labels. Default is "masked_group_" |
Value
A data frame with labels masked rowwise within the selected columns.
See Also
mask_labels for masking a single vector,
mask_variables for masking multiple variables, and
mask_names for masking variable names.
Examples
df <- data.frame(
treat_1 = c("control", "treatment", "placebo"),
treat_2 = c("treatment", "placebo", "control"),
treat_3 = c("placebo", "control", "treatment"),
condition_a = c("A", "B", "A"),
condition_b = c("B", "A", "B"),
id = 1:3
)
set.seed(1037)
# Mask one set of variables
library(dplyr)
df |> mask_variables_rowwise(starts_with("treat_"))
# Using character vectors
df |> mask_variables_rowwise(c("treat_1", "treat_2", "treat_3"))
# Mask multiple sets separately
df |>
mask_variables_rowwise(starts_with("treat_")) |>
mask_variables_rowwise(c("condition_a", "condition_b"))
# Example with custom prefix
df |> mask_variables_rowwise(starts_with("treat_"), prefix = "group_")
Scramble a vector of values
Description
Scramble a vector of values
Usage
scramble_values(x)
Arguments
x |
a vector |
Value
the scrambled vector
See Also
scramble_variables for scrambling multiple variables in a data frame, and
scramble_variables_rowwise for rowwise scrambling.
Examples
# Example with character vector
set.seed(123)
x <- letters[1:10]
scramble_values(x)
# Example with numeric vector
nums <- 1:5
scramble_values(nums)
# Scramble a column in the 'williams' dataset
data(williams)
# Simple scrambling of a single column
set.seed(123)
williams$ecology_scrambled <- scramble_values(williams$ecology)
head(williams[c("ecology", "ecology_scrambled")])
Scrambling the content of several variables in a data frame
Description
Scramble the values of several selected variables in a data frame simultaneously. Supports independent scrambling, joint scrambling, and within-group scrambling.
Usage
scramble_variables(data, ..., .groups = NULL, together = FALSE)
Arguments
data |
a data frame |
... |
Columns to scramble using tidyselect semantics. Each can be:
|
.groups |
Optional grouping columns. Scrambling will be done within each group.
Supports the same tidyselect syntax as column selection. Grouping columns must not overlap with
the columns selected in |
together |
logical. If |
Value
A data frame with the specified columns scrambled. If grouping is specified, scrambling is done within each group.
See Also
scramble_values for scrambling a single vector, and
scramble_variables_rowwise for rowwise scrambling.
Examples
df <- data.frame(
x = 1:6,
y = letters[1:6],
group = c("A", "A", "A", "B", "B", "B")
)
set.seed(123)
# Example without grouping. Variables scrambled across the entire data frame.
# Using bare names
df |> scramble_variables(x, y)
# Or using character vector
df |> scramble_variables(c("x", "y"))
# Example with together = TRUE. Variables scrambled together as a unit per row.
df |> scramble_variables(c("x", "y"), together = TRUE)
# Example with grouping. Variable only scrambled within groups.
df |> scramble_variables("y", .groups = "group")
# Example combining grouping and together parameters
df |> scramble_variables(c("x", "y"), .groups = "group", together = TRUE)
# Example with tidyselect helpers
library(dplyr)
df |> scramble_variables(starts_with("x"))
df |> scramble_variables(where(is.numeric), .groups = "group")
# Example with the 'williams' dataset
data(williams)
williams |> scramble_variables(c("ecology", "age"))
williams |> scramble_variables(1:5)
williams |> scramble_variables(c("ecology", "age"), .groups = "gender")
williams |> scramble_variables(c(1, 2), .groups = 3)
williams |> scramble_variables(c("ecology", "age"), together = TRUE)
williams |> scramble_variables(c("ecology", "age"), .groups = "gender", together = TRUE)
Scramble values across multiple columns rowwise in a data frame
Description
For each row, shuffle values across the selected columns. All selections passed
via ... are combined into a single set and scrambled together.
To scramble different variable groups separately, call the function multiple times.
Usage
scramble_variables_rowwise(data, ...)
Arguments
data |
A data frame. |
... |
Columns to scramble using tidyselect semantics. All arguments are combined into a single set. Each can be:
If |
Details
Rowwise scrambling moves values between columns, so selected columns must be type-compatible. This function requires all selected columns to have the same class (or be an integer/double mix). For factors, the selected columns must also have identical levels.
Value
A data frame with values scrambled rowwise within the selected columns.
See Also
scramble_values for scrambling a single vector, and
scramble_variables for scrambling multiple variables.
Examples
df <- data.frame(
day_1 = c(1, 4, 7),
day_2 = c(2, 5, 8),
day_3 = c(3, 6, 9),
score_a = c(10, 40, 70),
score_b = c(20, 50, 80),
id = 1:3
)
set.seed(123)
# Scramble one set of variables
library(dplyr)
df |> scramble_variables_rowwise(starts_with("day_"))
# Using character vectors
df |> scramble_variables_rowwise(c("day_1", "day_2", "day_3"))
# Scramble multiple sets separately
df |>
scramble_variables_rowwise(starts_with("day_")) |>
scramble_variables_rowwise(c("score_a", "score_b"))
# Multiple selectors are combined into one set (values can move between day_* and score_*)
df |> scramble_variables_rowwise(starts_with("day_"), starts_with("score_"))
Stereotyping of High-Wealth Individuals Across Ecologies
Description
Data from a study by Williams et al. testing whether high-wealth individuals are perceived as having faster life history strategies (e.g., more impulsive, less invested) when associated with "desperate" ecological conditions compared to "hopeful" ones.
Usage
data(williams)
Format
A data frame with 224 rows (one per participant) and 25 variables:
- subject
Unique subject identifier (integer).
- ecology
Experimental condition:
"Desperate"or"Hopeful"(character).- age
Participant's age in years (numeric).
- gender
Self-reported gender: 1 = Male, 2 = Female (numeric); may be recoded as factor.
- duration_in_seconds
Time taken to complete the survey (numeric).
- attention_1
First attention check response: 1 = correct, 0 = incorrect (numeric).
- attention_2
Second attention check response: 1 = correct, 0 = incorrect (numeric).
- SexUnres_1
Perceived sexual unrestrictedness: "likely to have short-term relationships" (1–7 Likert).
- SexUnres_2
"likely to engage in casual sex" (1–7).
- SexUnres_3
"not interested in long-term commitment" (1–7).
- SexUnres_4_r
"faithful to romantic partners" — reverse-coded (1–7).
- SexUnres_5_r
"committed in relationships" — reverse-coded (1–7).
- Impuls_1
"acts without thinking" (1–7).
- Impuls_2_r
"thinks carefully before acting" — reverse-coded (1–7).
- Impul_3_r
"plans ahead" — reverse-coded (1–7).
% Note: likely typo in original; was Impul not Impuls?
- Opport_1
"opportunities for long-term planning exist" (1–7).
- Opport_2
"can save money for the future" (1–7).
- Opport_3
"can make career plans" (1–7).
- Opport_4
"can plan for retirement" (1–7).
- Opport_5
"has control over future outcomes" (1–7).
- Opport_6_r
"life is unpredictable" — reverse-coded (1–7).
- InvEdu_1_r
"invests in education" — reverse-coded (1–7).
- InvEdu_2_r
"values academic achievement" — reverse-coded (1–7).
- InvChild_1
"invests time and resources in children" (1–7).
- InvChild_2_r
"neglects parental responsibilities" — reverse-coded (1–7).
Source
Williams, S. A., Galak, J., & Kruger, D. J. (2019). The influence of ecology on social perceptions: When wealth signals faster life history strategies. Evolutionary Behavioral Sciences, 13(4), 313–325. doi:10.1037/ebs0000148
Data based on materials available at: https://osf.io/xyz12 (replace with real link if known)