In the Eriksen flanker task participants are presented with an image
of an odd number of arrows (usually five or seven). Their task is to
indicate the orientation (left or right) of the middle arrow as quickly
as possible whilst ignoring the flanking arrows on left and right. There
are two types of stimuli in the task: in the **congruent**
condition (e.g. ’<<<<<<<’) both the middle arrow
and the flanking arrows point in the same direction, whereas in the
**incongruent** condition
(e.g. ’<<<><<<’) the middle arrow points to the
opposite direction of the flanking arrows.

As the participants have to consciously ignore and inhibit the misleading information provided by the flanking arrows in the incongruent condition, the performance in the incongruent condition is robustly worse than in the congruent condition, both in terms of longer reaction times as well as higher proportion of errors. The difference between reaction times and error rates in congruent and incongruent cases is a measure of subject’s ability to focus his or her attention and inhibit distracting stimuli.

In the illustration below we compare reaction times and error rates when solving the flanker task between the control group (healthy subjects) and the test group (subjects suffering from a certain medical condition).

First, we load package **bayes4psy** and package
**dplyr** for data wrangling. Second, we load the data and
split them into control and test groups. For reaction time analysis we
use only data where the response to the stimuli was correct:

```
# libs
library(bayes4psy)
library(dplyr)
# load data
<- flanker
data
# analyse only correct answers, load test and control data
<- data %>% filter(result == "correct" &
control_rt == "control")
group
<- data %>% filter(result == "correct" &
test_rt == "test") group
```

The model requires subjects to be indexed from \(1\) to \(n\). Control group subject indexes range from 22 to 45, so we cast it to 1 to 23.

```
# control group subject indexes range is 22..45 cast it to 1..23
# test group indexes are OK
$subject <- control_rt$subject - 21 control_rt
```

Now we are ready to fit the Bayesian reaction time model for both groups. The model function requires two parameters – a vector of reaction times \(t\) and the vector of subject indexes \(s\). Note here that, due to vignette limitations, all fits are built using only one chain, using more chains in parallel is usually more efficient. Also to increase the building speed of vignettes we greatly reduced the amount of iterations, use an appropriate amount of iterations when executing actual analyses!

```
# prior
<- b_prior(family="uniform", pars=c(0, 3))
uniform_prior
# attach priors to relevant parameters
<- list(c("mu_m", uniform_prior))
priors
# fit
<- b_reaction_time(t=control_rt$rt,
rt_control_fit s=control_rt$subject,
priors=priors,
chains=1, iter=200, warmup=100)
<- b_reaction_time(t=test_rt$rt,
rt_test_fit s=test_rt$subject,
priors=priors,
chains=1, iter=200, warmup=100)
```

Before we interpret the results, we check MCMC diagnostics and model fit.

```
# plot trace
plot_trace(rt_control_fit)
```

`plot_trace(rt_test_fit)`

The traceplots gives us no cause for concern regarding MCMC convergence and mixing. Note that the first 1000 iterations (shaded gray) are used for warmup (tuning of the MCMC algorithm) and are discarded. The next 1000 iterations are used for sampling.

```
# the commands below are commented out for the sake of brevity
#print(rt_control_fit)
#print(rt_test_fit)
```

The output above provides further MCMC diagnostics, which again do
not give us cause for concern (only output of one fit is provided for
the sake of brevity). The convergence diagnostic **Rhat**
is practically 1 for all parameters and there is little auto-correlation
(possibly even some positive auto-correlation) – effective sample sizes
(**n_eff**) are of the order of samples taken and Monte
Carlo standard errors (**se_mean**) are relatively
small.

What is a good-enough effective sample sizes depends on our goal. If we are interested in posterior quantities such as the more extreme percentiles, the effective sample sizes should be 10,000 or higher, if possible. If we are only interested in estimating the mean, 100 effective samples is in most cases enough for a practically negligible Monte Carlo error.

We can increase the effective sample size by increasing the amount of
MCMC iterations with the **iter** parameter. In our case we
can achieve an effective sample size of 10,000 by setting
**iter** to 4000. Because the MCMC diagnostics give us no
reason for concern, we can leave the **warmup** parameter
at its default value of 1000.

```
<- b_reaction_time(t=control_rt$rt,
rt_control_fit s=control_rt$subject,
iter=4000)
<- b_reaction_time(t=test_rt$rt,
rt_test_fit s=test_rt$subject,
iter=4000)
```

Because we did not explicitly define any priors, flat (improper) priors were put on all of the model’s parameters. In some cases, flat priors are a statement that we have no prior knowledge about the experiment results (in some sense). In general, even flat priors can express a preference for a certain region of parameter space. In practice, we will almost always have some prior information and we should incorporate it into the modelling process.

Next, we check whether the model fits the data well by using the
**plot** function. If we set the **subjects**
parameter to **FALSE**, we will get a less detailed group
level fit. The data are visualized as a blue region while the fit is
visualized with a black line. In this case the model fits the underlying
data well.

```
# check fits
plot(rt_control_fit)
```

`plot(rt_test_fit)`

We now use the **compare_means** function to compare
reaction times between healthy (control) and unhealthy (test) subjects.
In the example below we use a rope (region of practical equivalence)
interval of 0.01 s, meaning that differences smaller that 1/100 of a
second are deemed as equal. The **compare_means** function
provides a user friendly output of the comparison and also returns the
results in the form of a **data.frame**.

```
# set rope (region of practical equivalence) interval to +/- 10ms
<- 0.01
rope
# which mean is smaller/larger
<- compare_means(rt_control_fit, fit2=rt_test_fit, rope=rope) rt_control_test
```

```
##
## ---------- Group 1 vs Group 2 ----------
## Probabilities:
## - Group 1 < Group 2: 0.97 +/- 0.01714
## - Group 1 > Group 2: 0.02 +/- 0.01407
## - Equal: 0.01 +/- 0.01000
## 95% HDI:
## - Group 1 - Group 2: [-0.20, -0.02]
```

The **compare_means** function output contains
probabilities that one group has shorter reaction times than the other,
the probability that both groups are equal (if rope interval is
provided) and the 95% HDI (highest density interval) for the difference
between groups. Based on the output we are quite certain (98% +/- 0.5%)
that the healthy group’s (**rt_control_fit**) expected
reaction times are lower than the unhealthy group’s
(**rt_test_fit**).

We can also visualize this difference by using the
**plot_means_difference** function. The
**plot_means** function is an alternative for comparing
**rt_control_fit** and **rt_test_fit** – the
function visualizes the parameters that determine the means of each
model.

```
# difference plot
plot_means_difference(rt_control_fit, fit2=rt_test_fit, rope=rope)
```

The graph above visualizes the difference between
**rt_control_fit** and **rt_test_fit**. The
histogram visualizes the distribution of the difference, vertical blue
line denotes the mean, the black band at the bottom marks the 95% HDI
interval and the gray band marks the rope interval. Since the whole 95%
HDI of difference is negative and lies outside of the rope interval we
can conclude that the statement that healthy subjects are faster than
unhealthy ones is most likely correct.

```
# visual comparsion of mean difference
plot_means(rt_control_fit, fit2=rt_test_fit)
```

Above you can see a visualization of means for
**rt_control_fit** and **rt_test_fit**. Group
1 visualizes means for the healthy subjects and group 2 for the
unhealthy subjects.

Based on our analysis we can claim with high probability (98% +/- 0.5%) that healthy subjects have faster reaction times when solving the flanker task than unhealthy subjects. Next, we analyze if the same applies to success rates.

The information about success of subject’s is stored as correct/incorrect. However, the Bayesian success rate model requires binary inputs (0/1) so we have to transform the data. Also, like in the reaction time example, we have to correct the indexes of control group subjects.

```
# map correct/incorrect/timeout to 1 (success) or 0 (fail)
$result_numeric <- 0
data$result == "correct", ]$result_numeric <- 1
data[data
# split to control and test groups
<- data %>% filter(group == "control")
control_sr <- data %>% filter(group == "test")
test_sr
# control group subject indexes range is 22..45 cast it to 1..23
# test group indexes are OK
$subject <- control_sr$subject - 21 control_sr
```

Since the only prior information about the success rate of
participants was the fact that success rate is located between 0 and 1,
we used a beta distribution to put a uniform prior on the [0, 1]
interval (we put a \(\BE(1, 1)\) prior
on the \(p\) parameter). We execute the
model fitting by running the **b_success_rate** function
with appropriate input data.

```
# priors
<- b_prior(family="beta", pars=c(1, 1))
p_prior
# attach priors to relevant parameters
<- list(c("p", p_prior))
priors
# fit
<- b_success_rate(r=control_sr$result_numeric,
sr_control_fit s=control_sr$subject,
priors=priors,
chains=1, iter=200, warmup=100)
```

```
##
## SAMPLING FOR MODEL 'success_rate' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000184 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.84 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1: three stages of adaptation as currently configured.
## Chain 1: Reducing each adaptation stage to 15%/75%/10% of
## Chain 1: the given number of warmup iterations:
## Chain 1: init_buffer = 15
## Chain 1: adapt_window = 75
## Chain 1: term_buffer = 10
## Chain 1:
## Chain 1: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 1: Iteration: 20 / 200 [ 10%] (Warmup)
## Chain 1: Iteration: 40 / 200 [ 20%] (Warmup)
## Chain 1: Iteration: 60 / 200 [ 30%] (Warmup)
## Chain 1: Iteration: 80 / 200 [ 40%] (Warmup)
## Chain 1: Iteration: 100 / 200 [ 50%] (Warmup)
## Chain 1: Iteration: 101 / 200 [ 50%] (Sampling)
## Chain 1: Iteration: 120 / 200 [ 60%] (Sampling)
## Chain 1: Iteration: 140 / 200 [ 70%] (Sampling)
## Chain 1: Iteration: 160 / 200 [ 80%] (Sampling)
## Chain 1: Iteration: 180 / 200 [ 90%] (Sampling)
## Chain 1: Iteration: 200 / 200 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.333 seconds (Warm-up)
## Chain 1: 0.274 seconds (Sampling)
## Chain 1: 0.607 seconds (Total)
## Chain 1:
```

```
<- b_success_rate(r=test_sr$result_numeric,
sr_test_fit s=test_sr$subject,
priors=priors,
chains=1, iter=200, warmup=100)
```

```
##
## SAMPLING FOR MODEL 'success_rate' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.00017 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.7 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: WARNING: There aren't enough warmup iterations to fit the
## Chain 1: three stages of adaptation as currently configured.
## Chain 1: Reducing each adaptation stage to 15%/75%/10% of
## Chain 1: the given number of warmup iterations:
## Chain 1: init_buffer = 15
## Chain 1: adapt_window = 75
## Chain 1: term_buffer = 10
## Chain 1:
## Chain 1: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 1: Iteration: 20 / 200 [ 10%] (Warmup)
## Chain 1: Iteration: 40 / 200 [ 20%] (Warmup)
## Chain 1: Iteration: 60 / 200 [ 30%] (Warmup)
## Chain 1: Iteration: 80 / 200 [ 40%] (Warmup)
## Chain 1: Iteration: 100 / 200 [ 50%] (Warmup)
## Chain 1: Iteration: 101 / 200 [ 50%] (Sampling)
## Chain 1: Iteration: 120 / 200 [ 60%] (Sampling)
## Chain 1: Iteration: 140 / 200 [ 70%] (Sampling)
## Chain 1: Iteration: 160 / 200 [ 80%] (Sampling)
## Chain 1: Iteration: 180 / 200 [ 90%] (Sampling)
## Chain 1: Iteration: 200 / 200 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.293 seconds (Warm-up)
## Chain 1: 0.262 seconds (Sampling)
## Chain 1: 0.555 seconds (Total)
## Chain 1:
```

The process for inspecting Bayesian fits is the same as above. When
visually inspecting the quality of the fit (the **plot**
function) we can set the **subjects** parameter to
**FALSE**, which visualizes the fit on the group level.
This offers a quicker, but less detailed method of inspection. Again one
of the commands is commented out for the sake of brevity.

```
# inspect control group fit
plot_trace(sr_control_fit)
```

`plot(sr_control_fit, subjects=FALSE)`

`print(sr_control_fit)`

```
## Inference for Stan model: success_rate.
## 1 chains, each with iter=200; warmup=100; thin=1;
## post-warmup draws per chain=100, total post-warmup draws=100.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## p0 0.96 0.00 0.01 0.95 0.96 0.96 0.96 0.97 99 0.99
## tau 106.87 12.47 61.04 33.41 59.73 91.52 135.67 267.97 24 0.99
## p[1] 0.98 0.00 0.01 0.96 0.98 0.98 0.99 0.99 46 1.01
## p[2] 0.95 0.00 0.02 0.92 0.94 0.95 0.96 0.97 71 1.00
## p[3] 0.98 0.00 0.01 0.95 0.97 0.98 0.99 0.99 52 1.00
## p[4] 0.95 0.00 0.01 0.92 0.95 0.95 0.96 0.97 77 0.99
## p[5] 0.95 0.00 0.01 0.92 0.94 0.95 0.96 0.97 141 0.99
## p[6] 0.97 0.00 0.01 0.95 0.96 0.97 0.97 0.98 126 1.02
## p[7] 0.97 0.00 0.01 0.96 0.97 0.97 0.98 0.98 104 0.99
## p[8] 0.94 0.00 0.02 0.90 0.93 0.94 0.95 0.97 95 1.01
## p[9] 0.98 0.00 0.01 0.96 0.97 0.98 0.98 0.99 81 0.99
## p[10] 0.95 0.00 0.01 0.92 0.94 0.95 0.96 0.97 131 0.99
## p[11] 0.95 0.00 0.01 0.92 0.94 0.95 0.96 0.97 177 1.02
## p[12] 0.93 0.00 0.02 0.90 0.92 0.94 0.95 0.96 47 1.05
## p[13] 0.98 0.00 0.01 0.97 0.98 0.98 0.99 0.99 39 0.99
## p[14] 0.97 0.00 0.01 0.95 0.96 0.97 0.98 0.99 132 1.01
## p[15] 0.98 0.00 0.01 0.96 0.97 0.98 0.98 0.99 71 0.99
## p[16] 0.94 0.00 0.01 0.91 0.93 0.94 0.95 0.96 70 0.99
## p[17] 0.97 0.00 0.01 0.95 0.96 0.97 0.97 0.98 200 0.99
## p[18] 0.96 0.00 0.01 0.93 0.96 0.96 0.97 0.98 85 1.02
## p[19] 0.97 0.00 0.01 0.94 0.96 0.97 0.97 0.98 89 1.00
## p[20] 0.98 0.00 0.01 0.96 0.97 0.98 0.99 0.99 85 1.01
## p[21] 0.92 0.00 0.02 0.88 0.92 0.93 0.94 0.95 99 0.99
## p[22] 0.97 0.00 0.01 0.95 0.96 0.97 0.98 0.98 137 0.99
## p[23] 0.96 0.00 0.01 0.94 0.96 0.96 0.97 0.98 171 0.99
## p[24] 0.98 0.00 0.01 0.97 0.98 0.98 0.99 1.00 90 1.01
## lp__ -706.57 1.01 4.78 -715.76 -709.83 -705.69 -703.48 -699.14 23 1.06
##
## Samples were drawn using NUTS(diag_e) at Fri Sep 29 14:32:29 2023.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
```

```
# inspect test group fit
plot_trace(sr_test_fit)
```

`plot(sr_test_fit, subjects=FALSE)`

`#print(sr_test_fit)`

Since diagnostic functions show no pressing issues and the fits look
good we can proceed with the actual comparison between the two fitted
models. We will again estimate the difference between two groups with
**compare_means**.

```
# which one is higher
<- compare_means(sr_control_fit, fit2=sr_test_fit) sr_control_test
```

```
##
## ---------- Group 1 vs Group 2 ----------
## Probabilities:
## - Group 1 < Group 2: 0.57 +/- 0.04976
## - Group 1 > Group 2: 0.43 +/- 0.04976
## 95% HDI:
## - Group 1 - Group 2: [-0.02, 0.02]
```

As we can see the success rate between the two groups is not that different. Since the probability that healthy group is more successful is only 53% (+/- 1%) and the 95% HDI of the difference ([0.02, 0.02]) includes the 0 we cannot claim inequality. We can visualize this result by using the function.

```
# difference plot
plot_means_difference(sr_control_fit, fit2=sr_test_fit)
```

Above you can see the visualization of the difference between the
**sr_control_fit** and the **sr_test_fit**.
Histogram visualizes the distribution of the difference, vertical blue
line denotes the mean difference and the black band at the bottom marks
the 95% HDI interval. Since the 95% HDI of difference includes the value
of 0 we cannot claim inequality. If we used a rope interval and the
whole rope interval lied in the 95% HDI interval we could claim
equality.