A package for causal meta-analysis estimation of a binary response and binary treatment using aggregated data, based on Berenfeld et al. (https://arxiv.org/abs/2505.20168). The output is the aggregated estimate and its standard deviation, the package allows for simple and fast computation of different measures such as Risk Difference (RD), Risk Ratio (RR), Odds Ratio (OR) and Survival Ratio (SR).
The package consists of the function camea
. This
function can take the measures Risk Difference, Risk Ratio and log-Risk
Ratio, Odds Ratio and log-Odds Ratio, Survival Ratio and log-Survival
Ratio.
The data can be input in the following formats: vectors, data.frame, matrix or sparse matrix.
The output of the function is a list of two elements: the first element is a dataframe that has, for each study, its related estimate and standard error; the second element of the list is the aggregated causal-meta analysis estimate and its standard error.
If plot = TRUE
the function camea
prints a
forestplot that displays the estimate of interest for each study and the
aggregated estimate(s). All estimates are shown with their associated
95%-level CI.
If random.effects = TRUE
, the output and plot will
include both the causal meta-analysis estimate and the random-effects
meta-analysis estimate.
If log.scale = TRUE
, the meta-analysis will be performed
on the logarithmic transformation of the chosen measure.
The latest release of the package can be installed through CRAN (soon):
The development version can be installed from github
Another installation possibility is to clone the repo, and then within the r-package folder run
require(CaMeA)
# Example 1: With vectors of total numbers and events in treated and control
# Generate data
n <- 10
treated_total <- sample.int(1000, n)
control_total <- sample.int(1000, n)
treated_events <- rbinom(n,treated_total,0.5)
control_events <- rbinom(n,control_total,0.5)
# Choose measure: RD
measure <- "RD"
# Apply function
result <- camea(measure = measure, ai = treated_events, n1i = treated_total,
ci = control_events, n2i = control_total)
# Example 2: With contingency tables entries in data.frame format
# Generate data
n <- 10
treated_total <- sample.int(1000, n)
control_total <- sample.int(1000, n)
treated_events <- rbinom(n,treated_total,0.5)
control_events <- rbinom(n,control_total,0.5)
treated_negatives <- treated_total - treated_events
control_negatives <- control_total - control_events
dat <- data.frame(treated_events,control_events,treated_negatives,control_negatives)
# Choose measure: log-OR
measure <- "OR"
# Apply function
result <- camea(measure = measure, ai = treated_events, bi = treated_negatives,
ci = control_events, di = control_negatives, data = dat, log.scale = TRUE)
If plot = TRUE
it is possible to print the forestplot of
the meta-analysis.
# apply the same function but print the plot
result <- camea(measure = measure, ai = treated_events, n1i = treated_total,
ci = control_events, n2i = control_total, log.scale = TRUE, plot = TRUE)
We consider a set of \(K \in \mathbb{N}\) randomized controlled trials measuring the same binary outcome \(Y \in \{0,1\}\) and the same binary treatment \(A \in \{0,1\}\). We assume that each study provides a table of the positive and negative outcomes among both the treated and untreated groups, as follows:
Y=1 | Y=0 | |
---|---|---|
A=1 | \(n_{11}(k)\) | \(n_{01}(k)\) |
A=0 | \(n_{10}(k)\) | \(n_{00}(k)\) |
where \(n_{ay}(k)\) denotes the number of individuals in the study \(k \in [K]\) with treatment status \(A = a\) and observed outcome \(Y=y\).
Let \(Y(a)\) be the counterfactual outcome under \(A = a\) and \(H \in [K]\) be the study variable.
Let \(\hat{\psi}_a (k) = n_{a1}(k) / n_a(k)\) be the natural estimator of \(\psi_a (k) = \mathbb{E}[Y(a)|H=k]\) with \(a \in \{0,1\}\). Let \(\hat{\psi}_a = \sum_{k=1}^K \hat{\pi}_k \hat{\psi}_a(k)\) the estimator of \(\psi_a = \mathbb{E}[Y(a)]\) where \(\pi_k = \mathbb{P}(H = k)\) and \(\hat{\pi}_k = n_a(k)/n_k\).
The estimator is asymptotically normal, converging to the following distribution:
\[\begin{equation*} \sqrt{n} \: ((\hat{\psi}_1,\hat{\psi}_0) - (\psi_1,\psi_0)) \xrightarrow{L} N\biggr{(}0, \begin{bmatrix} 0 & \sigma^2_1 \\ \sigma^2_0 & 0 \end{bmatrix} \biggr{)} \end{equation*}\]
where the asymptotic variances are given by:
\[\begin{equation*} \sigma^2_a = \sum_{k=1}^K \pi_k \frac{\psi_a (k) (1 - \psi_a (k))}{e^a(k) (1 - e(k))^{1-a}} + \sum_{k=1}^K \pi_k \psi_a(k)^2 - \psi_a^2 \end{equation*}\]
with \(e(k) = \mathbb{P}(A = 1 | H = k)\) and \(a \in \{ 0,1 \}\).
Then, by applying the delta method, we find that for every first-order continuously differentiable contrast \(\phi \: : \: (\psi_1,\psi_0) \mapsto \phi(\psi_1,\psi_0)\) we have:
\[\begin{equation*} \sqrt{n} \: (\phi(\hat{\psi}_1,\hat{\psi}_0) - \phi(\psi_1,\psi_0)) \xrightarrow{L} N ( 0, \eta^2 ) \end{equation*}\]
where
\[\begin{equation*} \eta^2 = \sigma_1^2 \: \partial_1 \phi(\psi_1,\psi_0)^2 + \sigma_0^2 \: \partial_0 \phi(\psi_1,\psi_0)^2 \end{equation*}\]
and \(\partial_a \phi(\psi_1,\psi_0)\) denotes the partial derivative of \(\phi\) with respect to \(\psi_a\).
We will use the above approximation to estimate the variance of the measures:
We also provide estimates and confidence intervals for the Risk Ratio, Odds Ratio, and Survival Ratio on the logarithmic scale. In that case, the required partial derivatives for \(a \in {0,1}\) are given by:
\[\begin{equation*} \partial_a \log \phi(\psi_1,\psi_0) = \frac{1}{\phi(\psi_1,\psi_0)} \partial_a \phi(\psi_1,\psi_0) \end{equation*}\]
If the selected effect measure is Risk Difference (RD), Risk Ratio
(RR) or Odds Ratio (OR) and the option
random.effects = TRUE
is set, the output and the forest
plot will display two summary estimates: the primary causal
meta-analysis estimate and the conventional random-effects estimate.
The random-effects estimate is calculated via the package
metafor
by W. Viechtbauer (2010), which is a well known
tool for conducting meta-analysis on R.
It is important to notice that the package metafor
only
provides estimates of the Risk Ratio and Odds Ratio on the logarithmic
scale. Our package automatically converts these log-scale estimates back
to the natural scale by applying the exponential function to both the
point estimate and the bounds of the confidence interval.
The following excerpt from the camea
function
illustrates how we integrate metafor
for our
computation:
# Create forest plot with single study results
metafor::forest(x = results$thetai, ci.lb = results$ci.lb, ci.ub = results$ci.ub,
header = c("Study",paste(measure_name,"[95% CI]")), top=2,
ylim=c(-2, n_study+2), refline = refline, cex = 0.8)
# Computation of random-effects analysis and display in forest plot, if
# option "random.effects = TRUE"
dat <- metafor::escalc(measure=measure, ai=ai_vals, n1i=n1i_vals, ci=ci_vals,
n2i=n2i_vals, data=dat)
metafor::addpoly(res_random_effects, row = -1, mlab="Random-effects model", cex = 0.8)
# Display causal-meta analysis in forest plot
res <- metafor::rma(yi = result$estimate, sei = result$se, method="FE")
metafor::addpoly(res, row = 0, mlab="Causal meta-analysis", cex = 0.8)
Berenfeld, C., Boughdiri, A., Colnet, B., van Amsterdam, W. A. C., Bellet, A., Khellaf, R., Scornet, E., & Josse, J. (2025). Causal Meta-Analysis: Rethinking the Foundations of Evidence-Based Medicine. arXiv:2505.20168.
Viechtbauer, W. (2010). Conducting Meta-Analyses in R with the metafor Package. Journal of Statistical Software, 36(3), 1–48. https://doi.org/10.18637/jss.v036.i03