Causal Meta Analysis for Aggregated Data

Description

A package for causal meta-analysis estimation of a binary response and binary treatment using aggregated data, based on Berenfeld et al. (https://arxiv.org/abs/2505.20168). The output is the aggregated estimate and its standard deviation, the package allows for simple and fast computation of different measures such as Risk Difference (RD), Risk Ratio (RR), Odds Ratio (OR) and Survival Ratio (SR).

The package consists of the function camea. This function can take the measures Risk Difference, Risk Ratio and log-Risk Ratio, Odds Ratio and log-Odds Ratio, Survival Ratio and log-Survival Ratio.

The data can be input in the following formats: vectors, data.frame, matrix or sparse matrix.

The output of the function is a list of two elements: the first element is a dataframe that has, for each study, its related estimate and standard error; the second element of the list is the aggregated causal-meta analysis estimate and its standard error.

If plot = TRUE the function camea prints a forestplot that displays the estimate of interest for each study and the aggregated estimate(s). All estimates are shown with their associated 95%-level CI.

If random.effects = TRUE, the output and plot will include both the causal meta-analysis estimate and the random-effects meta-analysis estimate.

If log.scale = TRUE, the meta-analysis will be performed on the logarithmic transformation of the chosen measure.

How to install

The latest release of the package can be installed through CRAN (soon):

install.packages("CaMeA")

The development version can be installed from github

devtools::install_github("cberenfeld/camea", build_vignettes = TRUE)

Another installation possibility is to clone the repo, and then within the r-package folder run

Rscript build_package.R

Usage Example

require(CaMeA)

# Example 1: With vectors of total numbers and events in treated and control

# Generate data
n <- 10

treated_total <- sample.int(1000, n)
control_total <- sample.int(1000, n)
treated_events <- rbinom(n,treated_total,0.5)
control_events <- rbinom(n,control_total,0.5)

# Choose measure: RD
measure <- "RD"

# Apply function
result <- camea(measure = measure, ai = treated_events, n1i = treated_total,
                    ci = control_events, n2i = control_total)

# Example 2: With contingency tables entries in data.frame format

# Generate data
n <- 10

treated_total <- sample.int(1000, n)
control_total <- sample.int(1000, n)
treated_events <- rbinom(n,treated_total,0.5)
control_events <- rbinom(n,control_total,0.5)
treated_negatives <- treated_total - treated_events
control_negatives <- control_total - control_events

dat <- data.frame(treated_events,control_events,treated_negatives,control_negatives)

# Choose measure: log-OR
measure <- "OR"

# Apply function
result <- camea(measure = measure, ai = treated_events, bi = treated_negatives,
                    ci = control_events, di = control_negatives, data = dat, log.scale = TRUE)

Plot

If plot = TRUE it is possible to print the forestplot of the meta-analysis.

# apply the same function but print the plot
result <- camea(measure = measure, ai = treated_events, n1i = treated_total,
                    ci = control_events, n2i = control_total, log.scale = TRUE, plot = TRUE)

Notation

We consider a set of \(K \in \mathbb{N}\) randomized controlled trials measuring the same binary outcome \(Y \in \{0,1\}\) and the same binary treatment \(A \in \{0,1\}\). We assume that each study provides a table of the positive and negative outcomes among both the treated and untreated groups, as follows:

	Y=1	Y=0
A=1	\(n_{11}(k)\)	\(n_{01}(k)\)
A=0	\(n_{10}(k)\)	\(n_{00}(k)\)

where \(n_{ay}(k)\) denotes the number of individuals in the study \(k \in [K]\) with treatment status \(A = a\) and observed outcome \(Y=y\).

Let \(Y(a)\) be the counterfactual outcome under \(A = a\) and \(H \in [K]\) be the study variable.

Computation of confidence intervals

Let \(\hat{\psi}_a (k) = n_{a1}(k) / n_a(k)\) be the natural estimator of \(\psi_a (k) = \mathbb{E}[Y(a)|H=k]\) with \(a \in \{0,1\}\). Let \(\hat{\psi}_a = \sum_{k=1}^K \hat{\pi}_k \hat{\psi}_a(k)\) the estimator of \(\psi_a = \mathbb{E}[Y(a)]\) where \(\pi_k = \mathbb{P}(H = k)\) and \(\hat{\pi}_k = n_a(k)/n_k\).

The estimator is asymptotically normal, converging to the following distribution:

\[\begin{equation*} \sqrt{n} \: ((\hat{\psi}_1,\hat{\psi}_0) - (\psi_1,\psi_0)) \xrightarrow{L} N\biggr{(}0, \begin{bmatrix} 0 & \sigma^2_1 \\ \sigma^2_0 & 0 \end{bmatrix} \biggr{)} \end{equation*}\]

where the asymptotic variances are given by:

\[\begin{equation*} \sigma^2_a = \sum_{k=1}^K \pi_k \frac{\psi_a (k) (1 - \psi_a (k))}{e^a(k) (1 - e(k))^{1-a}} + \sum_{k=1}^K \pi_k \psi_a(k)^2 - \psi_a^2 \end{equation*}\]

with \(e(k) = \mathbb{P}(A = 1 | H = k)\) and \(a \in \{ 0,1 \}\).

Then, by applying the delta method, we find that for every first-order continuously differentiable contrast \(\phi \: : \: (\psi_1,\psi_0) \mapsto \phi(\psi_1,\psi_0)\) we have:

\[\begin{equation*} \sqrt{n} \: (\phi(\hat{\psi}_1,\hat{\psi}_0) - \phi(\psi_1,\psi_0)) \xrightarrow{L} N ( 0, \eta^2 ) \end{equation*}\]

where

\[\begin{equation*} \eta^2 = \sigma_1^2 \: \partial_1 \phi(\psi_1,\psi_0)^2 + \sigma_0^2 \: \partial_0 \phi(\psi_1,\psi_0)^2 \end{equation*}\]

and \(\partial_a \phi(\psi_1,\psi_0)\) denotes the partial derivative of \(\phi\) with respect to \(\psi_a\).

We will use the above approximation to estimate the variance of the measures:

Risk Difference: \(\phi(\psi_1,\psi_0) = \psi_1 - \psi_0\)
Risk Ratio: \(\phi(\psi_1,\psi_0) = \frac{\psi_1}{\psi_0}\)
Odds Ratio: \(\phi(\psi_1,\psi_0) = \frac{\psi_1}{1-\psi_1} \frac{1-\psi_0}{\psi_0}\)
Survival Ratio: \(\phi(\psi_1,\psi_0) = \frac{1- \psi_1}{1 - \psi_0}\).

We also provide estimates and confidence intervals for the Risk Ratio, Odds Ratio, and Survival Ratio on the logarithmic scale. In that case, the required partial derivatives for \(a \in {0,1}\) are given by:

\[\begin{equation*} \partial_a \log \phi(\psi_1,\psi_0) = \frac{1}{\phi(\psi_1,\psi_0)} \partial_a \phi(\psi_1,\psi_0) \end{equation*}\]

Use of metafor package

If the selected effect measure is Risk Difference (RD), Risk Ratio (RR) or Odds Ratio (OR) and the option random.effects = TRUE is set, the output and the forest plot will display two summary estimates: the primary causal meta-analysis estimate and the conventional random-effects estimate.

The random-effects estimate is calculated via the package metafor by W. Viechtbauer (2010), which is a well known tool for conducting meta-analysis on R.

It is important to notice that the package metafor only provides estimates of the Risk Ratio and Odds Ratio on the logarithmic scale. Our package automatically converts these log-scale estimates back to the natural scale by applying the exponential function to both the point estimate and the bounds of the confidence interval.

The following excerpt from the camea function illustrates how we integrate metafor for our computation:

# Create forest plot with single study results
metafor::forest(x = results$thetai, ci.lb = results$ci.lb, ci.ub = results$ci.ub,
                header = c("Study",paste(measure_name,"[95% CI]")), top=2,
                ylim=c(-2, n_study+2), refline = refline, cex = 0.8)

# Computation of random-effects analysis and display in forest plot, if
# option "random.effects = TRUE"
dat <- metafor::escalc(measure=measure, ai=ai_vals, n1i=n1i_vals, ci=ci_vals,
                       n2i=n2i_vals, data=dat)
metafor::addpoly(res_random_effects, row = -1, mlab="Random-effects model", cex = 0.8)

# Display causal-meta analysis in forest plot
res <- metafor::rma(yi = result$estimate, sei = result$se, method="FE")
metafor::addpoly(res, row = 0, mlab="Causal meta-analysis", cex = 0.8)