Version: | 0.9.7 |
Title: | Reinforcement Learning Tools for Two-Alternative Forced Choice Tasks |
Description: | Tools for building Rescorla-Wagner Models for Two-Alternative Forced Choice tasks, commonly employed in psychological research. Most concepts and ideas within this R package are referenced from Sutton and Barto (2018) <ISBN:9780262039246>. The package allows for the intuitive definition of RL models using simple if-else statements and three basic models built into this R package are referenced from Niv et al. (2012)<doi:10.1523/JNEUROSCI.5498-10.2012>. Our approach to constructing and evaluating these computational models is informed by the guidelines proposed in Wilson & Collins (2019) <doi:10.7554/eLife.49547>. Example datasets included with the package are sourced from the work of Mason et al. (2024) <doi:10.3758/s13423-023-02415-x>. |
Maintainer: | YuKi <hmz1969a@gmail.com> |
URL: | https://yuki-961004.github.io/binaryRL/ |
BugReports: | https://github.com/yuki-961004/binaryRL/issues |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | TRUE |
ByteCompile: | TRUE |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 4.0.0) |
Imports: | Rcpp, compiler, future, doFuture, foreach, doRNG, progressr |
LinkingTo: | Rcpp |
Suggests: | stats, GenSA, GA, DEoptim, pso, mlrMBO, mlr, ParamHelpers, smoof, lhs, DiceKriging, rgenoud, cmaes, nloptr |
NeedsCompilation: | yes |
Packaged: | 2025-08-19 10:07:55 UTC; hmz19 |
Author: | YuKi |
Repository: | CRAN |
Date/Publication: | 2025-08-19 23:50:02 UTC |
binaryRL: Reinforcement Learning Tools for Two-Alternative Forced Choice Tasks
Description
Tools for building Rescorla-Wagner Models for Two-Alternative Forced Choice tasks, commonly employed in psychological research. Most concepts and ideas within this R package are referenced from Sutton and Barto (2018) <ISBN:9780262039246>. The package allows for the intuitive definition of RL models using simple if-else statements and three basic models built into this R package are referenced from Niv et al. (2012)doi:10.1523/JNEUROSCI.5498-10.2012. Our approach to constructing and evaluating these computational models is informed by the guidelines proposed in Wilson & Collins (2019) doi:10.7554/eLife.49547. Example datasets included with the package are sourced from the work of Mason et al. (2024) doi:10.3758/s13423-023-02415-x.
Example Data
-
Mason_2024_G1
: Group 1 of Mason et al. (2024) -
Mason_2024_G2
: Group 2 of Mason et al. (2024)
Steps
-
run_m
: Step 1: Building reinforcement learning model -
rcv_d
: Step 2: Generating fake data for parameter and model recovery -
fit_p
: Step 3: Optimizing parameters to fit real data -
rpl_e
: Step 4: Replaying the experiment with optimal parameters
Models
Functions
-
func_gamma
: Utility Function -
func_eta
: Learning Rate -
func_epsilon
: Epsilon Related -
func_pi
: Upper-Confidence-Bound -
func_tau
: Soft-Max -
func_logl
: Loss Function
Processes
-
optimize_para
: optimizing free parameters -
simulate_list
: simulating fake datasets -
recovery_data
: parameter and model recovery
Summary
-
summary.binaryRL
: summary(binaryRL.res)
Author(s)
Maintainer: YuKi hmz1969a@gmail.com (ORCID)
See Also
Useful links:
Report bugs at https://github.com/yuki-961004/binaryRL/issues
Group 1 from Mason et al. (2024)
Description
This dataset originates from Experiment 1 of Mason et al. (2024), titled "Rare and extreme outcomes in risky choice" (doi:10.3758/s13423-023-02415-x). The raw data is publicly available on the Open Science Framework (OSF) at https://osf.io/hy3q4/. For the purposes of this package, we've performed basic cleaning and preprocessing of the original dataset.
Format
A data frame with 45000 rows and 11 columns:
- Subject
Subject ID, an integer (total of 143).
- Block
Block number, an integer (1 to 6).
- Trial
Trial number, an integer (1 to 60).
- L_choice
-
Left choice, a character indicating the option presented. The possible options are:
A: 100% gain 4.
B: 90% gain 0 and 10% gain 40.
C: 100% lose 4.
D: 90% lose 0 and 10% lose 40.
- R_choice
-
Right choice, a character indicating the option presented. The possible options are:
A: 100% gain 4.
B: 90% gain 0 and 10% gain 40.
C: 100% lose 4.
D: 90% lose 0 and 10% lose 40.
- L_reward
Reward associated with the left choice.
- R_reward
Reward associated with the right choice.
- Sub_Choose
The chosen option, either
L_choice
orR_choice
.- Frame
Type of frame, a character string (e.g., "Gain", "Loss", "Catch").
- NetWorth
The participant's net worth at the end of each trial.
- RT
The participant's reaction time (in milliseconds) for each trial.
Examples
# Load the Mason_2024_G1 dataset
data(binaryRL::Mason_2024_G1)
head(binaryRL::Mason_2024_G1)
Group 2 from Mason et al. (2024)
Description
This dataset originates from Experiment 2 of Mason et al. (2024), titled "Rare and extreme outcomes in risky choice" (doi:10.3758/s13423-023-02415-x). The raw data is publicly available on the Open Science Framework (OSF) at https://osf.io/hy3q4/. For the purposes of this package, we've performed basic cleaning and preprocessing of the original dataset.
Format
A data frame with 45000 rows and 11 columns:
- Subject
Subject ID, an integer (total of 143).
- Block
Block number, an integer (1 to 6).
- Trial
Trial number, an integer (1 to 60).
- L_choice
-
Left choice, a character indicating the option presented. The possible options are:
A: 100% gain 36.
B: 90% gain 40 and 10% gain 0.
C: 100% lose 36.
D: 90% lose 40 and 10% lose 0.
- R_choice
-
Right choice, a character indicating the option presented. The possible options are:
A: 100% gain 36.
B: 90% gain 40 and 10% gain 0.
C: 100% lose 36.
D: 90% lose 40 and 10% lose 0.
- L_reward
Reward associated with the left choice.
- R_reward
Reward associated with the right choice.
- Sub_Choose
The chosen option, either
L_choice
orR_choice
.- Frame
Type of frame, a character string (e.g., "Gain", "Loss", "Catch").
- NetWorth
The participant's net worth at the end of each trial.
- RT
The participant's reaction time (in milliseconds) for each trial.
Examples
# Load the Mason_2024_G2 dataset
data(binaryRL::Mason_2024_G2)
head(binaryRL::Mason_2024_G2)
Model: RSTD
Description
V_{new} = V_{old} + \eta_{+} \cdot (R - V_{old}), R > V_{old}
V_{new} = V_{old} + \eta_{-} \cdot (R - V_{old}), R < V_{old}
Usage
RSTD(params)
Arguments
params |
[vector] algorithm packages accept only one argument |
Value
loss [numeric]
algorithm packages accept only one return
Examples
## Not run:
RSTD <- function(params) {
res <- binaryRL::run_m(
data = data,
id = id,
eta = c(params[1], params[2]),
tau = c(params[3]),
priors = priors,
n_params = n_params,
n_trials = n_trials,
mode = mode,
policy = policy
)
assign(x = "binaryRL.res", value = res, envir = binaryRL.env)
loss <- switch(EXPR = estimate, "MLE" = -res$ll, "MAP" = -res$lpo)
switch(EXPR = mode, "fit" = loss, "simulate" = res, "replay" = res)
}
## End(Not run)
Model: TD
Description
V_{new} = V_{old} + \eta \cdot (R - V_{old})
Usage
TD(params)
Arguments
params |
[vector] algorithm packages accept only one argument |
Value
loss [numeric]
algorithm packages accept only one return
Examples
## Not run:
TD <- function(params) {
res <- binaryRL::run_m(
data = data,
id = id,
eta = c(params[1]),
tau = c(params[2]),
priors = priors,
n_params = n_params,
n_trials = n_trials,
mode = mode,
policy = policy
)
assign(x = "binaryRL.res", value = res, envir = binaryRL.env)
loss <- switch(EXPR = estimate, "MLE" = -res$ll, "MAP" = -res$lpo)
switch(EXPR = mode, "fit" = loss, "simulate" = res, "replay" = res)
}
## End(Not run)
Model: Utility
Description
U(R) = {R}^{\gamma}
V_{new} = V_{old} + \eta \cdot (U(R) - V_{old})
Usage
Utility(params)
Arguments
params |
[vector] algorithm packages accept only one argument |
Value
loss [numeric]
algorithm packages accept only one return
Examples
## Not run:
Utility <- function(params) {
res <- binaryRL::run_m(
data = data,
id = id,
eta = c(params[1]),
gamma = c(params[2]),
tau = c(params[3]),
priors = priors,
n_params = n_params,
n_trials = n_trials,
mode = mode,
policy = policy
)
assign(x = "binaryRL.res", value = res, envir = binaryRL.env)
loss <- switch(EXPR = estimate, "MLE" = -res$ll, "MAP" = -res$lpo)
switch(EXPR = mode, "fit" = loss, "simulate" = res, "replay" = res)
}
## End(Not run)
Step 3: Optimizing parameters to fit real data
Description
This function is designed to fit the optimal parameters of black-box
functions (models) to real-world data. Provided that the black-box
function adheres to the specified interface
(demo:
TD
,
RSTD
,
Utility
)
, this function can employ the various optimization algorithms detailed
below to find the best- fitting parameters for your model.
The function provides several optimization algorithms:
1. L-BFGS-B (from
stats::optim
)2. Simulated Annealing (
GenSA::GenSA
)3. Genetic Algorithm (
GA::ga
)4. Differential Evolution (
DEoptim::DEoptim
)5. Particle Swarm Optimization (
pso::psoptim
)6. Bayesian Optimization (
mlrMBO::mbo
)7. Covariance Matrix Adapting Evolutionary Strategy (
cmaes::cma_es
)8. Nonlinear Optimization (
nloptr::nloptr
)
For more information, please refer to the homepage of this package: https://yuki-961004.github.io/binaryRL/
Usage
fit_p(
policy = "off",
estimate = "MLE",
data,
id = NULL,
n_trials = NULL,
funcs = NULL,
model_name = c("TD", "RSTD", "Utility"),
fit_model = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1)),
priors = NULL,
tolerance = 0.001,
iteration_i = 10,
iteration_g = 0,
initial_params = NA,
initial_size = 50,
seed = 123,
nc = 1,
algorithm
)
Arguments
policy |
[string] Specifies the learning policy to be used.
This determines how the model updates action values based on observed or
simulated choices. It can be either
default: |
estimate |
[string] Estimation method. Can be either
default: |
data |
[data.frame] This data should include the following mandatory columns:
|
id |
[CharacterVector] A vector specifying the subject ID(s) for which parameters should be fitted. The function will process only the subjects provided in this vector. To fit all subjects, you can either explicitly set the argument as
It is strongly recommended to avoid using simple numeric sequences like
default: |
n_trials |
[integer] Represents the total number of trials a single subject experienced
in the experiment. If this parameter is kept at its default value
of default: |
funcs |
[CharacterVector] A character vector containing the names of all user-defined functions
required for the computation. When parallel computation is enabled
(i.e., Therefore, if you have created your own reinforcement learning model
that modifies the package's default six default functions
(default functions:
|
model_name |
[List] The name of fit modals e.g. |
fit_model |
[List] A collection of functions applied to fit models to the data. e.g. |
lower |
[List] The lower bounds for model fit models e.g. |
upper |
[List] The upper bounds for model fit models e.g. |
priors |
[List] A list specifying the prior distributions for the model parameters.
This argument is mandatory when using 1. Static MAP Estimation (Non-Hierarchical) This approach is used when you have a strong, pre-defined belief about the parameter priors and do not want the model to update them iteratively.
2. Hierarchical Bayesian Estimation via EM This approach is used to let the model learn the group-level (hierarchical) prior distributions directly from the data.
default: |
tolerance |
[double] Convergence threshold for MAP estimation. If the change in log posterior probability between iterations is smaller than this value, the algorithm is considered to have converged and the program will stop. default: |
iteration_i |
[integer] The number of iterations the optimization algorithm will perform when searching for the best-fitting parameters during the fitting phase. A higher number of iterations may increase the likelihood of finding a global optimum but also increases computation time. default: |
iteration_g |
[integer] The maximum number of iterations for the Expectation-Maximization (EM)
based MAP estimation. The algorithm will stop once this iteration
count is reached, even if the change in the log-posterior value has
not yet fallen below the default: |
initial_params |
[NumericVector] Initial values for the free parameters that the optimization algorithm will
search from. These are primarily relevant when using algorithms that require
an explicit starting point, such as default: |
initial_size |
[integer] This parameter corresponds to the population size in genetic
algorithms ( default: |
seed |
[integer] Random seed. This ensures that the results are reproducible and remain the same each time the function is run. default: |
nc |
[integer] Number of cores to use for parallel processing. Since fitting optimal parameters for each subject is an independent task, parallel computation can significantly speed up the fitting process:
|
algorithm |
[string] Choose an algorithm package from
In addition, any algorithm from the |
Value
The optimal parameters found by the algorithm for each subject,
along with the model fit calculated using these parameters.
This is returned as an object of class binaryRL
containing results
for all subjects with all models.
Note
While both fit_p
and rcv_d
utilize the same underlying
optimize_para
function to find optimal parameters, they play
distinct and sequential roles in the modeling pipeline.
The key differences are as follows:
Purpose and Data Source:
rcv_d
should always be performed beforefit_p
. Its primary role is to validate a model's stability by fitting it to synthetic data generated by the model itself. This process, known as parameter recovery, ensures the model is well-behaved. In contrast,fit_p
is used in the subsequent stage to fit the validated model to real experimental data.Estimation Method:
rcv_d
does not include anestimate
argument. This is because the synthetic data is generated from known "true" parameters, which are drawn from pre-defined distributions (typically uniform for most parameters and exponential for the inverse temperature). Since the ground truth is known, a hierarchical estimation (MAP) is not applicable. Thefit_p
function, however, requires this argument to handle real data where the true parameters are unknown.Policy Setting: In
fit_p
, thepolicy
setting has different effects: "on-policy" is better for learning choice patterns, while "off-policy" yields more accurate parameter estimates. Forrcv_d
, the process defaults to an "off-policy" approach because its main objectives are to verify if the true parameters can be accurately recovered and to assess whether competing models are distinguishable, tasks for which off-policy estimation is more suitable.
Examples
## Not run:
comparison <- binaryRL::fit_p(
data = binaryRL::Mason_2024_G2,
id = unique(binaryRL::Mason_2024_G2$Subject),
#+-----------------------------------------------------------------------------+#
#|----------------------------- black-box function ----------------------------|#
#funcs = c("your_funcs"),
policy = c("off", "on"),
fit_model = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
model_name = c("TD", "RSTD", "Utility"),
#|--------------------------------- estimate ----------------------------------|#
estimate = c("MLE", "MAP"),
#|------------------------------------ MLE ------------------------------------|#
lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
upper = list(c(1, 10), c(1, 1, 10), c(1, 1, 10)),
#|------------------------------------ MAP ------------------------------------|#
priors = list(
list(
eta = function(x) {stats::dunif(x, min = 0, max = 1, log = TRUE)},
tau = function(x) {stats::dexp(x, rate = 1, log = TRUE)}
),
list(
eta = function(x) {stats::dunif(x, min = 0, max = 1, log = TRUE)},
eta = function(x) {stats::dunif(x, min = 0, max = 1, log = TRUE)},
tau = function(x) {stats::dexp(x, rate = 1, log = TRUE)}
),
list(
eta = function(x) {stats::dunif(x, min = 0, max = 1, log = TRUE)},
gamma = function(x) {stats::dunif(x, min = 0, max = 1, log = TRUE)},
tau = function(x) {stats::dexp(x, rate = 1, log = TRUE)}
)
),
#|----------------------------- iteration number ------------------------------|#
iteration_i = 10,
iteration_g = 10,
#|-------------------------------- algorithms ---------------------------------|#
nc = 1, # <nc > 1>: parallel computation across subjects
# Base R Optimization
algorithm = "L-BFGS-B" # Gradient-Based (stats)
#|-----------------------------------------------------------------------------|#
# Specialized External Optimization
#algorithm = "GenSA" # Simulated Annealing (GenSA)
#algorithm = "GA" # Genetic Algorithm (GA)
#algorithm = "DEoptim" # Differential Evolution (DEoptim)
#algorithm = "PSO" # Particle Swarm Optimization (pso)
#algorithm = "Bayesian" # Bayesian Optimization (mlrMBO)
#algorithm = "CMA-ES" # Covariance Matrix Adapting (cmaes)
#|-----------------------------------------------------------------------------|#
# Optimization Library (nloptr)
#algorithm = c("NLOPT_GN_MLSL", "NLOPT_LN_BOBYQA")
#|-------------------------------- algorithms ---------------------------------|#
#################################################################################
)
result <- dplyr::bind_rows(comparison)
# Ensure the output directory exists before writing
if (!dir.exists("../OUTPUT")) {
dir.create("../OUTPUT", recursive = TRUE)
}
write.csv(result, "../OUTPUT/result_comparison.csv", row.names = FALSE)
## End(Not run)
Function: Epsilon Related
Description
The exploration strategy parameters are
threshold
, epsilon
, and lambda
.
-
Epsilon-first: Used when only
threshold
is set. Subjects choose randomly for trials less thanthreshold
and by value for trials greater than 'threshold
. -
Epsilon-greedy: Used if
threshold
is default (1) andepsilon
is set. Subjects explore with probabilityepsilon
throughout the experiment. -
Epsilon-decreasing: Used if
threshold
is default (1), andlambda
is set. In this strategy, the probability of random choice (exploration) decreases as trials increase. The parameterlambda
controls the rate at which this probability declines with each trial.
Usage
func_epsilon(
i,
L_freq,
R_freq,
L_pick,
R_pick,
L_value,
R_value,
var1 = NA,
var2 = NA,
threshold = 1,
epsilon = NA,
lambda = NA,
alpha,
beta
)
Arguments
i |
The current row number. |
L_freq |
The frequency of left option appearance |
R_freq |
The frequency of right option appearance |
L_pick |
The number of times left option was picked |
R_pick |
The number of times left option was picked |
L_value |
The value of the left option |
R_value |
The value of the right option |
var1 |
[character] Column name of extra variable 1. If your model uses more than just reward and expected value, and you need other information, such as whether the choice frame is Gain or Loss, then you can input the 'Frame' column as var1 into the model.
|
var2 |
[character] Column name of extra variable 2. If one additional variable, var1, does not meet your needs, you can add another additional variable, var2, into your model.
|
threshold |
[integer]
Controls the initial exploration phase in the epsilon-first strategy.
This is the number of early trials where the subject makes purely random
choices, as they haven't yet learned the options' values. For example,
|
epsilon |
[numeric]
A parameter used in the epsilon-greedy exploration strategy. It
defines the probability of making a completely random choice, as opposed
to choosing based on the relative values of the left and right options.
For example, if
|
lambda |
[vector]
A numeric value that controls the decay rate of exploration probability
in the epsilon-decreasing strategy. A higher
|
alpha |
[vector] Extra parameters that may be used in functions. |
beta |
[vector] Extra parameters that may be used in functions. |
Value
A numeric value, either 0 or 1. 0 indicates no exploration (choice based on value), and 1 indicates exploration (random choice) for that trial.
Note
When customizing these functions, please ensure that you do not modify
the arguments. Instead, only modify the if-else
statements or
the internal logic to adapt the function to your needs.
Examples
## Not run:
func_epsilon <- function(
# Trial number
i,
# Number of times this option has appeared
L_freq,
R_freq,
# Number of times this option has been chosen
L_pick,
R_pick,
# Current value of this option
L_value,
R_value,
# Extra variables
var1 = NA,
var2 = NA,
# Free Parameters
threshold = 1,
epsilon = NA,
lambda = NA,
# Extra parameters
alpha,
beta
){
set.seed(i)
# Epsilon-First: random choosing before a certain trial number
if (i <= threshold) {
try <- 1
} else if (i > threshold & is.na(epsilon) & is.na(lambda)) {
try <- 0
# Epsilon-Greedy: random choosing throughout the experiment with probability epsilon
} else if (i > threshold & !(is.na(epsilon)) & is.na(lambda)){
try <- sample(
c(1, 0),
prob = c(epsilon, 1 - epsilon),
size = 1
)
# Epsilon-Decreasing: probability of random choosing decreases as trials increase
} else if (i > threshold & is.na(epsilon) & !(is.na(lambda))) {
try <- sample(
c(1, 0),
prob = c(
1 / (1 + lambda * i),
lambda * i / (1 + lambda * i)
),
size = 1
)
}
else {
try <- "ERROR"
}
return(try)
}
## End(Not run)
Function: Learning Rate
Description
The structure of eta
depends on the model type:
-
Temporal Difference (TD) model:
eta
is a single numeric value representing the learning rate. -
Risk-Sensitive Temporal Difference (RSTD) model:
eta
is a numeric vector of length two, whereeta[1]
represents the learning rate for "good" outcomes, which means the reward is higher than the expected value.eta[2]
represents the learning rate for "bad" outcomes, which means the reward is lower than the expected value.
Usage
func_eta(
i,
L_freq,
R_freq,
L_pick,
R_pick,
L_value,
R_value,
var1 = NA,
var2 = NA,
value,
utility,
reward,
occurrence,
eta,
alpha,
beta
)
Arguments
i |
The current row number. |
L_freq |
The frequency of left option appearance |
R_freq |
The frequency of right option appearance |
L_pick |
The number of times left option was picked |
R_pick |
The number of times left option was picked |
L_value |
The value of the left option |
R_value |
The value of the right option |
var1 |
[character] Column name of extra variable 1. If your model uses more than just reward and expected value, and you need other information, such as whether the choice frame is Gain or Loss, then you can input the 'Frame' column as var1 into the model.
|
var2 |
[character] Column name of extra variable 2. If one additional variable, var1, does not meet your needs, you can add another additional variable, var2, into your model.
|
value |
The expected value of the stimulus in the subject's mind at this point in time. |
utility |
The subjective value that the subject assigns to the objective reward. |
reward |
The objective reward received by the subject after selecting a stimulus. |
occurrence |
The number of times the same stimulus has been chosen. |
eta |
[vector]
Parameters used in the Learning Rate Function,
The structure of
|
alpha |
[vector] Extra parameters that may be used in functions. |
beta |
[vector] Extra parameters that may be used in functions. |
Value
learning rate eta
Note
When customizing these functions, please ensure that you do not modify
the arguments. Instead, only modify the if-else
statements or
the internal logic to adapt the function to your needs.
Examples
## Not run:
func_eta <- function(
# Trial number
i,
# Number of times this option has appeared
L_freq,
R_freq,
# Number of times this option has been chosen
L_pick,
R_pick,
# Current value of this option
L_value,
R_value,
# Extra variables
var1 = NA,
var2 = NA,
# Expected value for this stimulus
value,
# Subjective utility
utility,
# Reward observed after choice
reward,
# Occurrence count for this stimulus
occurrence,
# Free Parameter
eta,
# Extra parameters
alpha,
beta
){
################################# [ TD ] ####################################
if (length(eta) == 1) {
eta <- as.numeric(eta)
}
################################ [ RSTD ] ###################################
else if (length(eta) == 2 & utility < value) {
eta <- eta[1]
}
else if (length(eta) == 2 & utility >= value) {
eta <- eta[2]
}
################################ [ ERROR ] ##################################
else {
eta <- "ERROR" # Error check
}
return(eta)
}
## End(Not run)
Function: Utility Function
Description
This function represents an exponent used in calculating utility from reward. Its application varies depending on the specific model:
-
Stevens' Power Law: Here, utility is calculated by raising the reward to the power of
gamma
. This describes how the subjective value (utility) of a reward changes non-linearly with its objective magnitude. -
Kahneman's Prospect Theory: This theory applies exponents differently for gains and losses, and introduces a loss aversion coefficient:
For positive rewards (gains), utility is the reward raised to the power of
gamma[1]
.For negative rewards (losses), utility is calculated by first multiplying the reward by
beta
, and then raising this product to the power ofgamma[2]
. Here,beta
acts as a loss aversion parameter, accounting for the greater psychological impact of losses compared to equivalent gains.
Usage
func_gamma(
i,
L_freq,
R_freq,
L_pick,
R_pick,
L_value,
R_value,
var1 = NA,
var2 = NA,
value,
utility,
reward,
occurrence,
gamma = 1,
alpha,
beta
)
Arguments
i |
The current row number. |
L_freq |
The frequency of left option appearance |
R_freq |
The frequency of right option appearance |
L_pick |
The number of times left option was picked |
R_pick |
The number of times left option was picked |
L_value |
The value of the left option |
R_value |
The value of the right option |
var1 |
[character] Column name of extra variable 1. If your model uses more than just reward and expected value, and you need other information, such as whether the choice frame is Gain or Loss, then you can input the 'Frame' column as var1 into the model.
|
var2 |
[character] Column name of extra variable 2. If one additional variable, var1, does not meet your needs, you can add another additional variable, var2, into your model.
|
value |
The expected value of the stimulus in the subject's mind at this point in time. |
utility |
The subjective value that the subject assigns to the objective reward. |
reward |
The objective reward received by the subject after selecting a stimulus. |
occurrence |
The number of times the same stimulus has been chosen. |
gamma |
[vector]
This parameter represents the exponent in utility functions,
|
alpha |
[vector] Extra parameters that may be used in functions. |
beta |
[vector] Extra parameters that may be used in functions. |
Value
Discount rate and utility
Note
When customizing these functions, please ensure that you do not modify
the arguments. Instead, only modify the if-else
statements or
the internal logic to adapt the function to your needs.
Examples
## Not run:
func_gamma <- function(
# Trial number
i,
# Number of times this option has appeared
L_freq,
R_freq,
# Number of times this option has been chosen
L_pick,
R_pick,
# Current value of this option
L_value,
R_value,
# Extra variables
var1 = NA,
var2 = NA,
# Expected value for this stimulus
value,
# Subjective utility
utility,
# Reward observed after choice
reward,
# Occurrence count for this stimulus
occurrence,
# Free Parameter
gamma = 1,
# Extra parameters
alpha,
beta
){
############################## [ Utility ] ##################################
if (length(gamma) == 1) {
gamma <- as.numeric(gamma)
utility <- sign(reward) * (abs(reward) ^ gamma)
}
############################### [ Error ] ###################################
else {
utility <- "ERROR"
}
return(list(gamma, utility))
}
## End(Not run)
Function: Loss Function
Description
This loss function reflects the similarity between human choices
and RL model predictions. If a human selects the left option and the
RL model predicts a high probability for the left option, then
logP_{L}
approaches 0, causing the first term to approach 0.
Since the human chose the left option, B_{R}
becomes 0, making the
second term naturally zero. Therefore, the more consistent the RL model's
prediction is with human choice, the closer this LL value is to 0.
Conversely, it approaches negative infinity.
LL = \sum B_{L} \times \log P_{L} + \sum B_{R} \times \log P_{R}
Usage
func_logl(
i,
L_freq,
R_freq,
L_pick,
R_pick,
L_value,
R_value,
L_dir,
R_dir,
L_prob,
R_prob,
var1 = NA,
var2 = NA,
LR,
try,
value,
utility,
reward,
occurrence,
alpha,
beta
)
Arguments
i |
The current row number. |
L_freq |
The frequency of left option appearance |
R_freq |
The frequency of right option appearance |
L_pick |
The number of times left option was picked |
R_pick |
The number of times left option was picked |
L_value |
The value of the left option |
R_value |
The value of the right option |
L_dir |
Whether the participant chose the left option. |
R_dir |
Whether the participant chose the right option. |
L_prob |
The probability that the model assigns to choosing the left option. |
R_prob |
The probability that the model assigns to choosing the left option. |
var1 |
[character] Column name of extra variable 1. If your model uses more than just reward and expected value, and you need other information, such as whether the choice frame is Gain or Loss, then you can input the 'Frame' column as var1 into the model.
|
var2 |
[character] Column name of extra variable 2. If one additional variable, var1, does not meet your needs, you can add another additional variable, var2, into your model.
|
LR |
Are you calculating the probability for the left option or the right option? |
try |
If the choice was random, the value is 1; If the choice was based on value, the value is 0. |
value |
The expected value of the stimulus in the subject's mind at this point in time. |
utility |
The subjective value that the subject assigns to the objective reward. |
reward |
The objective reward received by the subject after selecting a stimulus. |
occurrence |
The number of times the same stimulus has been chosen. |
alpha |
[vector] Extra parameters that may be used in functions. |
beta |
[vector] Extra parameters that may be used in functions. |
Value
log-likelihood
Note
When customizing these functions, please ensure that you do not modify
the arguments. Instead, only modify the if-else
statements or
the internal logic to adapt the function to your needs.
Examples
## Not run:
func_logl <- function(
# Trial number
i,
# Number of times this option has appeared
L_freq,
R_freq,
# Number of times this option has been chosen
L_pick,
R_pick,
# Current value of this option
L_value,
R_value,
#
L_dir,
R_dir,
#
L_prob,
R_prob,
# Extra variables
var1 = NA,
var2 = NA,
# Whether calculating probability for left or right choice
LR,
# Is it a random choosing trial?
try,
# Extra parameters
alpha,
beta
){
logl <- switch(
EXPR = LR,
"L" = L_dir * log(L_prob),
"R" = R_dir * log(R_prob)
)
}
## End(Not run)
Function: Upper-Confidence-Bound
Description
Unlike epsilon-greedy, which explores indiscriminately, UCB is a more
intelligent exploration strategy. It biases the value of each action based
on how often it has been selected. For options chosen fewer times, or those
with high uncertainty, a larger "uncertainty bonus" is added to their
estimated value. This increases their selection probability, effectively
encouraging the exploration of potentially optimal, yet unexplored actions.
A higher pi
indicates a greater bias toward giving less-chosen
options.
Usage
func_pi(
i,
L_freq,
R_freq,
L_pick,
R_pick,
L_value,
R_value,
var1,
var2,
LR,
pi,
alpha,
beta
)
Arguments
i |
The current row number. |
L_freq |
The frequency of left option appearance |
R_freq |
The frequency of right option appearance |
L_pick |
The number of times left option was picked |
R_pick |
The number of times left option was picked |
L_value |
The value of the left option |
R_value |
The value of the right option |
var1 |
[character] Column name of extra variable 1. If your model uses more than just reward and expected value, and you need other information, such as whether the choice frame is Gain or Loss, then you can input the 'Frame' column as var1 into the model.
|
var2 |
[character] Column name of extra variable 2. If one additional variable, var1, does not meet your needs, you can add another additional variable, var2, into your model.
|
LR |
Are you calculating the probability for the left option or the right option? |
pi |
[vector]
Parameter used in the Upper-Confidence-Bound (UCB) action selection
formula.
|
alpha |
[vector] Extra parameters that may be used in functions. |
beta |
[vector] Extra parameters that may be used in functions. |
Value
The probability of choosing this option
Note
When customizing these functions, please ensure that you do not modify
the arguments. Instead, only modify the if-else
statements or
the internal logic to adapt the function to your needs.
Examples
## Not run:
func_tau <- function(
# Trial number
i,
# Number of times this option has appeared
L_freq,
R_freq,
# Number of times this option has been chosen
L_pick,
R_pick,
# Current value of this option
L_value,
R_value,
# Extra variables
var1 = NA,
var2 = NA,
# Whether calculating probability for left or right choice
LR,
# Free parameter
pi = 0.1,
# Extra parameters
alpha,
beta
){
############################ [ at least 1 ] #################################
if (is.na(x = pi)) {
if (L_pick == 0 & R_pick == 0) {
bias <- 0
}
else if (LR == "L" & L_pick == 0 & R_pick > 0) {
bias <- 1e+4
}
else if (LR == "R" & R_pick == 0 & L_pick > 0) {
bias <- 1e+4
}
else {
bias <- 0
}
}
############################ [ bias value ] #################################
else if (!(is.na(x = pi)) & LR == "L") {
bias <- pi * sqrt(log(L_pick + exp(1)) / (L_pick + 1e-10))
}
else if (!(is.na(x = pi)) & LR == "R") {
bias <- pi * sqrt(log(R_pick + exp(1)) / (R_pick + 1e-10))
}
############################## [ error ] ####################################
else {
bias <- "ERROR"
}
return(bias)
}
## End(Not run)
Function: Soft-Max Function
Description
The softmax function describes a probabilistic choice rule. It implies
that options with higher subjective values are chosen with a greater
probability, rather than deterministic. This probability of choosing
the higher-valued option increases with the parameter tau
.
A higher tau
indicates greater sensitivity to value differences, making
choices more deterministic.
Usage
func_tau(
i,
L_freq,
R_freq,
L_pick,
R_pick,
L_value,
R_value,
var1 = NA,
var2 = NA,
LR,
try,
tau,
lapse,
alpha,
beta
)
Arguments
i |
[numeric] The current row number. |
L_freq |
[numeric] The frequency of left option appearance |
R_freq |
[numeric] The frequency of right option appearance |
L_pick |
[numeric] The number of times left option was picked |
R_pick |
[numeric] The number of times left option was picked |
L_value |
[numeric] The value of the left option with bias (if pi != 0) |
R_value |
[numeric] The value of the right option with bias (if pi != 0) |
var1 |
[character] Column name of extra variable 1. If your model uses more than just reward and expected value, and you need other information, such as whether the choice frame is Gain or Loss, then you can input the 'Frame' column as var1 into the model.
|
var2 |
[character] Column name of extra variable 2. If one additional variable, var1, does not meet your needs, you can add another additional variable, var2, into your model. default: |
LR |
[character] Are you calculating the probability for the left option or the right option?
|
try |
[numeric] If the choice was random, the value is 1; If the choice was based on value, the value is 0. |
tau |
[vector] Parameters used in the Soft-Max Function.
|
lapse |
[numeric] A numeric value between 0 and 1, representing the lapse rate. You can interpret this parameter as the probability of the agent "slipping" or making a random choice, irrespective of the learned action values. This accounts for moments of inattention or motor errors. In this sense, it represents the minimum probability with which any given option will be selected. It is a free parameter that acknowledges that individuals do not always make decisions with full concentration throughout an experiment. From a modeling perspective, the lapse rate is crucial for preventing the
log-likelihood calculation from returning
This ensures each option has a minimum selection probability of 1 percent in TAFC tasks. |
alpha |
[vector] Extra parameters that may be used in functions. |
beta |
[vector] Extra parameters that may be used in functions. |
Value
The probability of choosing this option
Note
When customizing these functions, please ensure that you do not modify
the arguments. Instead, only modify the if-else
statements or
the internal logic to adapt the function to your needs.
Examples
## Not run:
func_tau <- function(
# Trial number
i,
# Number of times this option has appeared
L_freq,
R_freq,
# Number of times this option has been chosen
L_pick,
R_pick,
# Current value of this option
L_value,
R_value,
# Extra variables
var1 = NA,
var2 = NA,
# Whether calculating probability for left or right choice
LR,
# Is it a random choosing trial?
try,
# Free parameters
tau,
# Extra parameters
alpha,
beta
){
############################### [ random ] ##################################
if (try == 1) {
prob <- 0.5
}
############################# [ greedy-max ] ################################
else if (try == 0 & LR == "L" & is.na(tau)) {
if (L_value == R_value) {
prob <- 0.5
}
else if (L_value > R_value) {
prob <- 1
}
else if (L_value < R_value) {
prob <- 0
}
}
else if (try == 0 & LR == "R" & is.na(tau)) {
if (L_value == R_value) {
prob <- 0.5
}
else if (R_value > L_value) {
prob <- 1
}
else if (R_value < L_value) {
prob <- 0
}
}
############################### [ soft-max ] ################################
else if (try == 0 & LR == "L" & !(is.na(tau))) {
prob <- 1 / (1 + exp(-(L_value - R_value) * tau))
}
else if (try == 0 & LR == "R" & !(is.na(tau))) {
prob <- 1 / (1 + exp(-(R_value - L_value) * tau))
}
################################ [ error ] ##################################
else {
prob <- "ERROR"
}
################################ [ lapse ] ##################################
prob <- (1 - lapse) * prob + (lapse / 2)
return(prob)
}
## End(Not run)
Process: Optimizing Parameters
Description
This is an internal helper function for fit_p
. Its primary purpose
is to provide a unified interface for users to interact with various
optimization algorithm packages. It adapts the inputs and outputs
to be compatible with eight distinct algorithms, ensuring a seamless
experience regardless of the underlying solver used.
The function provides several optimization algorithms:
1. L-BFGS-B (from
stats::optim
)2. Simulated Annealing (
GenSA::GenSA
)3. Genetic Algorithm (
GA::ga
)4. Differential Evolution (
DEoptim::DEoptim
)5. Particle Swarm Optimization (
pso::psoptim
)6. Bayesian Optimization (
mlrMBO::mbo
)7. Covariance Matrix Adapting Evolutionary Strategy (
cmaes::cma_es
)8. Nonlinear Optimization (
nloptr::nloptr
)
For more information, please refer to the homepage of this package: https://yuki-961004.github.io/binaryRL/
Usage
optimize_para(
policy = "off",
estimate = "MLE",
data,
id,
n_trials,
n_params,
obj_func,
lower,
upper,
priors = NULL,
initial_params = NA,
initial_size = 50,
iteration = 10,
seed = 123,
algorithm
)
Arguments
policy |
[character] Specifies the learning policy to be used.
This determines how the model updates action values based on observed or
simulated choices. It can be either
default: |
estimate |
[character] Estimation method. Can be either
default: |
data |
[data.frame] This data should include the following mandatory columns:
|
id |
[character] Specifies the ID of the subject whose optimal parameters will be fitted. This parameter accepts either string or numeric values. The provided ID must correspond to an existing subject identifier within the raw dataset provided to the function. |
n_trials |
[integer] The total number of trials in your experiment. |
n_params |
[integer] The number of free parameters in your model. |
obj_func |
[function] The objective function that the optimization algorithm package accepts.
This function must strictly take only one argument, |
lower |
[vector] Lower bounds of free parameters |
upper |
[vector] Upper bounds of free parameters |
priors |
[list]
A list specifying the prior distributions for the model parameters.
This argument is mandatory when using 1. Static MAP Estimation (Non-Hierarchical) This approach is used when you have a strong, pre-defined belief about the parameter priors and do not want the model to update them iteratively.
2. Hierarchical Bayesian Estimation via EM This approach is used to let the model learn the group-level (hierarchical) prior distributions directly from the data.
default: |
initial_params |
[vector] Initial values for the free parameters that the optimization algorithm will
search from. These are primarily relevant when using algorithms that require
an explicit starting point, such as default: |
initial_size |
[integer] This parameter corresponds to the population size in genetic
algorithms ( default: |
iteration |
[integer] The number of iterations the optimization algorithm will perform when searching for the best-fitting parameters during the fitting phase. A higher number of iterations may increase the likelihood of finding a global optimum but also increases computation time. |
seed |
[integer] Random seed. This ensures that the results are reproducible and remain the same each time the function is run. default: |
algorithm |
[character] Choose an algorithm package from
In addition, any algorithm from the |
Value
the result of binaryRL with optimal parameters
Examples
## Not run:
binaryRL.res <- binaryRL::optimize_para(
data = binaryRL::Mason_2024_G2,
id = 1,
obj_func = binaryRL::RSTD,
n_params = 3,
n_trials = 360,
lower = c(0, 0, 0),
upper = c(1, 1, 1),
iteration = 10,
seed = 123,
algorithm = "L-BFGS-B" # Gradient-Based (stats)
#algorithm = "GenSA" # Simulated Annealing (GenSA)
#algorithm = "GA" # Genetic Algorithm (GA)
#algorithm = "DEoptim" # Differential Evolution (DEoptim)
#algorithm = "PSO" # Particle Swarm Optimization (pso)
#algorithm = "Bayesian" # Bayesian Optimization (mlrMBO)
#algorithm = "CMA-ES" # Covariance Matrix Adapting (cmaes)
#algorithm = c("NLOPT_GN_MLSL", "NLOPT_LN_BOBYQA")
)
summary(binaryRL.res)
## End(Not run)
Step 2: Generating fake data for parameter and model recovery
Description
This function is designed for model and parameter recovery of user-created
(black-box) models, provided they conform to the specified interface.
(demo:
TD
,
RSTD
,
Utility
).
The process involves generating synthetic datasets. First, parameters
are randomly sampled within a defined range. These parameters are then
used to simulate artificial datasets.
Subsequently, all candidate models are used to fit these simulated datasets. Model recoverability is assessed if a synthetic dataset generated by Model A is consistently best fitted by Model A itself.
Furthermore, the function allows users to evaluate parameter recoverability. If, for instance, a synthetic dataset generated by Model A was based on parameters like 0.3 and 0.7, and Model A then recovers optimal parameters close to 0.3 and 0.7 from this data, it indicates that the parameters of Model A are recoverable.
The function provides several optimization algorithms:
1. L-BFGS-B (from
stats::optim
)2. Simulated Annealing (
GenSA::GenSA
)3. Genetic Algorithm (
GA::ga
)4. Differential Evolution (
DEoptim::DEoptim
)5. Particle Swarm Optimization (
pso::psoptim
)6. Bayesian Optimization (
mlrMBO::mbo
)7. Covariance Matrix Adapting Evolutionary Strategy (
cmaes::cma_es
)8. Nonlinear Optimization (
nloptr::nloptr
)
For more information, please refer to the homepage of this package: https://yuki-961004.github.io/binaryRL/
Usage
rcv_d(
policy = "off",
data,
id = NULL,
n_trials = NULL,
funcs = NULL,
model_names = c("TD", "RSTD", "Utility"),
simulate_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
simulate_lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
simulate_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1)),
fit_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
fit_lower = list(c(0, 0), c(0, 0, 0), c(0, 0, 0)),
fit_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 10)),
iteration_s = 10,
iteration_f = 10,
initial_params = NA,
initial_size = 50,
seed = 1,
nc = 1,
algorithm
)
Arguments
policy |
[string] Specifies the learning policy to be used.
This determines how the model updates action values based on observed or
simulated choices. It can be either
default: |
data |
[data.frame] This data should include the following mandatory columns:
|
id |
[CharacterVector] Specifies which subject's data to use. In parameter and model recovery analyses, the specific subject ID is often irrelevant. Although the experimental trial order might have some randomness for each subject, the sequence of reward feedback is typically pseudo-random. The default value for this argument is default: |
n_trials |
[integer] Represents the total number of trials a single subject experienced
in the experiment. If this parameter is kept at its default value
of default: |
funcs |
[CharacterVector] A character vector containing the names of all user-defined functions
required for the computation. When parallel computation is enabled
(i.e., Therefore, if you have created your own reinforcement learning model
that modifies the package's default six default functions
(default functions:
|
model_names |
[List] The names of fit modals e.g. |
simulate_models |
[List] A collection of functions used to generate simulated data. |
simulate_lower |
[List] The lower bounds for simulate models e.g. |
simulate_upper |
[List] The upper bounds for simulate models e.g. |
fit_models |
[List] A collection of functions applied to fit models to the data. e.g. |
fit_lower |
[List] The lower bounds for model fit models e.g. |
fit_upper |
[List] The upper bounds for model fit models e.g. |
iteration_s |
[integer] This parameter determines how many simulated datasets are created for subsequent model and parameter recovery analyses. default: |
iteration_f |
[integer] The number of iterations the optimization algorithm will perform when searching for the best-fitting parameters during the fitting phase. A higher number of iterations may increase the likelihood of finding a global optimum but also increases computation time. default: |
initial_params |
[NumericVector] Initial values for the free parameters that the optimization algorithm will
search from. These are primarily relevant when using algorithms that require
an explicit starting point, such as default: |
initial_size |
[integer] This parameter corresponds to the population size in genetic
algorithms ( default: |
seed |
[integer] Random seed. This ensures that the results are reproducible and remain the same each time the function is run. default: |
nc |
[integer] Number of cores to use for parallel processing. Since fitting optimal parameters for each subject is an independent task, parallel computation can significantly speed up the fitting process:
|
algorithm |
[string] Choose an algorithm package from
In addition, any algorithm from the |
Value
A list where each element is a data.frame. Each data.frame within this list records the results of fitting synthetic data (generated by Model A) with Model B.
Note
While both fit_p
and rcv_d
utilize the same underlying
optimize_para
function to find optimal parameters, they play
distinct and sequential roles in the modeling pipeline.
The key differences are as follows:
Purpose and Data Source:
rcv_d
should always be performed beforefit_p
. Its primary role is to validate a model's stability by fitting it to synthetic data generated by the model itself. This process, known as parameter recovery, ensures the model is well-behaved. In contrast,fit_p
is used in the subsequent stage to fit the validated model to real experimental data.Estimation Method:
rcv_d
does not include anestimate
argument. This is because the synthetic data is generated from known "true" parameters, which are drawn from pre-defined distributions (typically uniform for most parameters and exponential for the inverse temperature). Since the ground truth is known, a hierarchical estimation (MAP) is not applicable. Thefit_p
function, however, requires this argument to handle real data where the true parameters are unknown.Policy Setting: In
fit_p
, thepolicy
setting has different effects: "on-policy" is better for learning choice patterns, while "off-policy" yields more accurate parameter estimates. Forrcv_d
, the process defaults to an "off-policy" approach because its main objectives are to verify if the true parameters can be accurately recovered and to assess whether competing models are distinguishable, tasks for which off-policy estimation is more suitable.
Examples
## Not run:
recovery <- binaryRL::rcv_d(
data = binaryRL::Mason_2024_G2,
#+-----------------------------------------------------------------------------+#
#|----------------------------- black-box function ----------------------------|#
#funcs = c("your_funcs"),
model_names = c("TD", "RSTD", "Utility"),
simulate_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
simulate_lower = list(c(0, 1), c(0, 0, 1), c(0, 0, 1)),
simulate_upper = list(c(1, 1), c(1, 1, 1), c(1, 1, 1)),
fit_models = list(binaryRL::TD, binaryRL::RSTD, binaryRL::Utility),
fit_lower = list(c(0, 1), c(0, 0, 1), c(0, 0, 1)),
fit_upper = list(c(1, 5), c(1, 1, 5), c(1, 1, 5)),
#|----------------------------- interation number -----------------------------|#
iteration_s = 100,
iteration_f = 100,
#|-------------------------------- algorithms ---------------------------------|#
nc = 1, # <nc > 1>: parallel computation across subjects
# Base R Optimization
algorithm = "L-BFGS-B" # Gradient-Based (stats)
#|-----------------------------------------------------------------------------|#
# Specialized External Optimization
#algorithm = "GenSA" # Simulated Annealing (GenSA)
#algorithm = "GA" # Genetic Algorithm (GA)
#algorithm = "DEoptim" # Differential Evolution (DEoptim)
#algorithm = "PSO" # Particle Swarm Optimization (pso)
#algorithm = "Bayesian" # Bayesian Optimization (mlrMBO)
#algorithm = "CMA-ES" # Covariance Matrix Adapting (cmaes)
#|-----------------------------------------------------------------------------|#
# Optimization Library (nloptr)
#algorithm = c("NLOPT_GN_MLSL", "NLOPT_LN_BOBYQA")
#|-------------------------------- algorithms ---------------------------------|#
#+#############################################################################+#
)
result <- dplyr::bind_rows(recovery) %>%
dplyr::select(simulate_model, fit_model, iteration, everything())
# Ensure the output directory exists
if (!dir.exists("../OUTPUT")) {
dir.create("../OUTPUT", recursive = TRUE)
}
write.csv(result, file = "../OUTPUT/result_recovery.csv", row.names = FALSE)
## End(Not run)
Process: Recovering Fake Data
Description
This function processes the synthetic datasets generated by
simulate_list()
. For each of these simulated datasets, it then
fits every model specified within the fit_model
list. In essence,
it iteratively calls the optimize_para()
function for each
generated object.
The fitting procedure is analogous to that performed by fit_p
,
and it similarly leverages parallel computation across subjects
to significantly accelerate the parameter estimation process.
Usage
recovery_data(
list,
id = 1,
n_trials,
n_params,
funcs = NULL,
policy,
model_name,
fit_model,
lower,
upper,
initial_params = NA,
initial_size = 50,
iteration = 10,
seed = 123,
nc = 1,
algorithm
)
Arguments
list |
[list] A list generated by function |
id |
[vector] Specifies which subject's data to use. In parameter and model recovery analyses, the specific subject ID is often irrelevant. Although the experimental trial order might have some randomness for each subject, the sequence of reward feedback is typically pseudo-random. The default value for this argument is default: |
n_trials |
[integer] The total number of trials in your experiment. |
n_params |
[integer] The number of free parameters in your model. |
funcs |
[character] A character vector containing the names of all user-defined functions required for the computation. When parallel computation is enabled (i.e., 'nc > 1'), user-defined models and their custom functions might not be automatically accessible within the parallel environment. Therefore, if you have created your own reinforcement learning model
that modifies the package's default four default functions
(default functions:
|
policy |
[character] Specifies the learning policy to be used.
This determines how the model updates action values based on observed or
simulated choices. It can be either
default: |
model_name |
[character] The name of your modal |
fit_model |
[function] fit model |
lower |
[vector] Lower bounds of free parameters |
upper |
[vector] Upper bounds of free parameters |
initial_params |
[numeric] Initial values for the free parameters that the optimization algorithm will
search from. These are primarily relevant when using algorithms that require
an explicit starting point, such as default: |
initial_size |
[integer] This parameter corresponds to the population size in genetic
algorithms ( default: |
iteration |
[integer] The number of iterations the optimization algorithm will perform when searching for the best-fitting parameters during the fitting phase. A higher number of iterations may increase the likelihood of finding a global optimum but also increases computation time. |
seed |
[integer] Random seed. This ensures that the results are reproducible and remain the same each time the function is run. default: |
nc |
[integer] Number of cores to use for parallel processing. Since fitting optimal parameters for each subject is an independent task, parallel computation can significantly speed up the fitting process:
default: |
algorithm |
[character]
Choose an algorithm package from
In addition, any algorithm from the |
Value
a data frame for parameter recovery and model recovery
Examples
## Not run:
binaryRL.res <- binaryRL::optimize_para(
data = Mason_2024_G2,
id = 1,
n_params = 3,
n_trials = 360,
obj_func = binaryRL::RSTD,
lower = c(0, 0, 0),
upper = c(1, 1, 10),
iteration = 100,
algorithm = "L-BFGS-B"
)
summary(binaryRL.res)
## End(Not run)
Step 4: Replaying the experiment with optimal parameters
Description
After completing Step 3 using fit_p()
to obtain the optimal parameters
for each subject and saving the resulting CSV locally, this function
allows you to load that result dataset. It then applies these optimal
parameters back into the reinforcement learning model, effectively
simulating how the "robot" (the model) would make its choices.
Based on this generated dataset, you can then analyze the robot's data in the same manner as you would analyze human behavioral data. If a particular model's fitted data can successfully reproduce the experimental effects observed in human subjects, it strongly suggests that this model is a good and valid representation of the process.
Usage
rpl_e(
data,
id = NULL,
result,
model,
model_name,
param_prefix = "param_",
n_trials = NULL
)
Arguments
data |
[data.frame] This data should include the following mandatory columns:
|
id |
[CharacterVector] A vector specifying the subject ID(s) for which parameters should be fitted. The function will process only the subjects provided in this vector. To fit all subjects, you can either explicitly set the argument as
It is strongly recommended to avoid using simple numeric sequences like
default: |
result |
[data.frame] Output data generated by the |
model |
[Function] A model function to be applied in evaluating the experimental effect. |
model_name |
[string] A character string specifying the name of the model to extract from the result. |
param_prefix |
[string] A prefix string used to identify parameter columns in the result data default: |
n_trials |
[integer] Represents the total number of trials a single subject experienced
in the experiment. If this parameter is kept at its default value
of default: |
Value
A list, where each element is a data.frame representing one subject's
results. Each data.frame includes the value update history for each option,
the learning rate (eta
), utility function (gamma
), and other
relevant information used in each update.
Examples
## Not run:
list <- list()
list[[1]] <- dplyr::bind_rows(
binaryRL::rpl_e(
data = binaryRL::Mason_2024_G2,
result = read.csv("../OUTPUT/result_comparison.csv"),
model = binaryRL::TD,
model_name = "TD",
param_prefix = "param_",
)
)
list[[2]] <- dplyr::bind_rows(
binaryRL::rpl_e(
data = binaryRL::Mason_2024_G2,
result = read.csv("../OUTPUT/result_comparison.csv"),
model = binaryRL::RSTD,
model_name = "RSTD",
param_prefix = "param_",
)
)
list[[3]] <- dplyr::bind_rows(
binaryRL::rpl_e(
data = binaryRL::Mason_2024_G2,
result = read.csv("../OUTPUT/result_comparison.csv"),
model = binaryRL::Utility,
param_prefix = "param_",
model_name = "Utility",
)
)
## End(Not run)
Step 1: Building reinforcement learning model
Description
This function is designed to construct and customize reinforcement learning models.
Items for model construction:
-
Data Input and Specification: You must provide the raw dataset for analysis. Crucially, you need to inform the
run_m
function about the corresponding column names within your dataset (e.g.,Mason_2024_G1
,Mason_2024_G2
) This is a game, so it's critical that your dataset includes rewards for both the human-chosen option and the unchosen options. -
Customizable RL Models: This function allows you to define and adjust the number of free parameters to create various reinforcement learning models.
-
Value Function:
-
Learning Rate: By adjusting the number of
eta
, you can construct basic reinforcement learning models such as Temporal Difference (TD) and Risk Sensitive Temporal Difference (RSTD). You can also directly adjustfunc_eta
to define your own custom learning rate function. -
Utility Function: You can directly adjust the form of
func_gamma
to incorporate the principles of Kahneman's Prospect Theory. Currently, the built-infunc_gamma
only takes the form of a power function, consistent with Stevens' Power Law.
-
-
Exploration–Exploitation Trade-off:
-
Initial Values: This involves setting the initial expected value for each option when it hasn't been chosen yet. A higher initial value encourages exploration.
-
Epsilon: Adjusting the
threshold
,epsilon
andlambda
parameters can lead to exploration strategies such as epsilon-first, epsilon-greedy, or epsilon-decreasing. -
Upper-Confidence-Bound: By adjusting
pi
, it controls the degree of exploration by scaling the uncertainty bonus given to less-explored options. -
Soft-Max: By adjusting the inverse temperature parameter
tau
, this controls the agent's sensitivity to value differences. A higher value of tau means greater emphasis on value differences, leading to more exploitation. A smaller value of tau indicates a greater tendency towards exploration.
-
-
-
Objective Function Format for Optimization: Once your model is defined in
run_m
, it must be structured as an objective function that acceptsparams
as input and returns a loss value (typicallylogL
). This format ensures compatibility with the algorithm package, which uses it to estimate optimal parameters. For an example of a standard objective function format, seeTD
,RSTD
,Utility
.
For more information, please refer to the homepage of this package: https://yuki-961004.github.io/binaryRL/
Usage
run_m(
name = NA,
mode = c("simulate", "fit", "replay"),
policy = c("on", "off"),
data,
id,
n_params,
n_trials,
gamma = 1,
eta,
initial_value = NA_real_,
threshold = 1,
epsilon = NA,
lambda = NA,
pi = NA,
tau = NA,
lapse = 0.02,
alpha = NA,
beta = NA,
priors = NULL,
util_func = func_gamma,
rate_func = func_eta,
expl_func = func_epsilon,
bias_func = func_pi,
prob_func = func_tau,
loss_func = func_logl,
sub = "Subject",
time_line = c("Block", "Trial"),
L_choice = "L_choice",
R_choice = "R_choice",
L_reward = "L_reward",
R_reward = "R_reward",
sub_choose = "Sub_Choose",
rob_choose = "Rob_Choose",
raw_cols = NULL,
var1 = NA_character_,
var2 = NA_character_,
seed = 123,
digits_1 = 2,
digits_2 = 5,
engine = "cpp"
)
Arguments
name |
[string] The name of your RL model |
mode |
[string] This parameter controls the function's operational mode. It has three possible values, each typically associated with a specific function:
In most cases, you won't need to modify this parameter directly, as suitable default values are set for different contexts. |
policy |
[string] Specifies the learning policy to be used.
This determines how the model updates action values based on observed or
simulated choices. It can be either
|
data |
[data.frame] This data should include the following mandatory columns:
|
id |
[string] Which subject is going to be analyzed. The value should correspond to an entry in the "sub" column, which must contain the subject IDs. e.g. |
n_params |
[integer] The number of free parameters in your model. |
n_trials |
[integer] The total number of trials in your experiment. |
gamma |
[NumericVector] Note: This should not be confused with the discount rate parameter
(also named gamma) found in Temporal Difference (TD) models.
Rescorla-Wagner model does not include a discount rate.
Here,
default: |
eta |
[NumericVector] Parameters used in the Learning Rate Function,
The structure of
TD: RSTD: |
initial_value |
[double] Subject's initial expected value for each stimulus's reward. If this value
is not set default: |
threshold |
[integer] Controls the initial exploration phase in the epsilon-first strategy.
This is the number of early trials where the subject makes purely random
choices, as they haven't yet learned the options' values. For example,
default: epsilon-first: |
epsilon |
[NumericVector] A parameter used in the epsilon-greedy exploration strategy. It
defines the probability of making a completely random choice, as opposed
to choosing based on the relative values of the left and right options.
For example, if
epsilon-greedy: |
lambda |
[NumericVector] A numeric value that controls the decay rate of exploration probability
in the epsilon-decreasing strategy. A higher
epsilon-decreasing: |
pi |
[NumericVector] Parameter used in the Upper-Confidence-Bound (UCB) action selection
formula.
default: |
tau |
[NumericVector] Parameters used in the Soft-Max Function.
default |
lapse |
[double] A numeric value between 0 and 1, representing the lapse rate. You can interpret this parameter as the probability of the agent "slipping" or making a random choice, irrespective of the learned action values. This accounts for moments of inattention or motor errors. In this sense, it represents the minimum probability with which any given option will be selected. It is a free parameter that acknowledges that individuals do not always make decisions with full concentration throughout an experiment. From a modeling perspective, the lapse rate is crucial for preventing the
log-likelihood calculation from returning
default: This ensures each option has a minimum selection probability of 1 percent in TAFC tasks. |
alpha |
[NumericVector] Extra parameters that may be used in functions. |
beta |
[NumericVector] Extra parameters that may be used in functions. |
priors |
[list] A list specifying the prior distributions for the model parameters.
This argument is mandatory when using default: |
util_func |
[Function] Utility Function see |
rate_func |
[Function] Learning Rate Function see |
expl_func |
[Function] Exploration Strategy Function see |
bias_func |
[Function] Upper-Confidence-Bound see |
prob_func |
[Function] Soft-Max Function see |
loss_func |
[Function] Loss Function see |
sub |
[string] Column name of subject ID e.g. |
time_line |
[CharacterVector] A vector specifying the name of the column that the sequence of the experiment. This argument defines how the experiment is structured, such as whether it is organized by "Block" with breaks in between, and multiple trials within each block. default: |
L_choice |
[string] Column name of left choice. default: |
R_choice |
[string] Column name of right choice. default: |
L_reward |
[string] Column name of the reward of left choice default: |
R_reward |
[string] Column name of the reward of right choice default: |
sub_choose |
[string] Column name of choices made by the subject. default: |
rob_choose |
[string] Column name of choices made by the model, which you could ignore. default: |
raw_cols |
[CharacterVector] Defaults to |
var1 |
[string] Column name of extra variable 1. If your model uses more than just reward and expected value, and you need other information, such as whether the choice frame is Gain or Loss, then you can input the 'Frame' column as var1 into the model. default: |
var2 |
[string] Column name of extra variable 2. If one additional variable, var1, does not meet your needs, you can add another additional variable, var2, into your model. default: |
seed |
[integer] Random seed. This ensures that the results are reproducible and remain the same each time the function is run. default: |
digits_1 |
[integer] The number of decimal places to retain for columns related to value function default: |
digits_2 |
[integer] The number of decimal places to retain for columns related to select function. default: |
engine |
[string] - - default: |
Value
A list of class binaryRL
containing the results of the model fitting.
Examples
data <- binaryRL::Mason_2024_G2
binaryRL.res <- binaryRL::run_m(
mode = "replay",
data = data,
id = 18,
eta = c(0.321, 0.765),
tau = 0.5,
n_params = 3,
n_trials = 360
)
summary(binaryRL.res)
Process: Simulating Fake Data
Description
This function is responsible for generating synthetic (fake) data using random numbers. For all parameters except the last one, their values are drawn from a uniform distribution within their respective specified ranges.
The last parameter, representing the temperature (tau
) in the soft-max
function, is drawn from an exponential distribution. If its upper
bound
is set to 1, it implies tau
is sampled from Exp(1)
(an exponential
distribution with a rate parameter of 1). If its lower
bound is set
to 1, this means that after tau
is randomly generated, it is shifted
to the right by adding 1 (i.e., tau+1
), establishing a minimum value.
Usage
simulate_list(
data,
id = 1,
obj_func,
n_params,
n_trials,
lower,
upper,
iteration = 10,
seed = 123
)
Arguments
data |
[data.frame] This data should include the following mandatory columns:
|
id |
[vector] Specifies which subject's data to use. In parameter and model recovery analyses, the specific subject ID is often irrelevant. Although the experimental trial order might have some randomness for each subject, the sequence of reward feedback is typically pseudo-random. The default value for this argument is default: |
obj_func |
[function] The objective function that the optimization algorithm package accepts.
This function must strictly take only one argument, |
n_params |
[integer] The number of free parameters in your model. |
n_trials |
[integer] The total number of trials in your experiment. |
lower |
[vector] Lower bounds of free parameters |
upper |
[vector] Upper bounds of free parameters |
iteration |
[integer] This parameter determines how many simulated datasets are created for subsequent model and parameter recovery analyses. |
seed |
[integer] Random seed. This ensures that the results are reproducible and remain the same each time the function is run. default: |
Value
a list with fake data generated by random free parameters
Examples
## Not run:
list_simulated <- binaryRL::simulate_list(
data = binaryRL::Mason_2024_G2,
obj_func = binaryRL::RSTD,
n_params = 3,
n_trials = 360,
lower = c(0, 0, 1),
upper = c(1, 1, 1),
iteration = 100
)
df_recovery <- binaryRL::recovery_data(
list = list_simulated,
fit_model = binaryRL::RSTD,
model_name = "RSTD",
n_params = 3,
n_trials = 360,
lower = c(0, 0, 1),
upper = c(1, 1, 5),
iteration = 100,
nc = 1,
algorithm = "L-BFGS-B"
)
## End(Not run)
S3method summary
Description
S3method summary
Usage
## S3 method for class 'binaryRL'
summary(object, ...)
Arguments
object |
binaryRL result |
... |
others |
Value
summary