Type: | Package |
Title: | Estimating Bivariate Dependency from Marginal Data |
Version: | 3.0.0 |
Description: | Provides statistical methods for estimating bivariate dependency (correlation) from marginal summary statistics across multiple studies. The package supports three modules: (1) bivariate correlation estimation for binary outcomes, (2) bivariate correlation estimation for continuous outcomes, and (3) estimation of component-wise means and variances under a conditional two-component Gaussian mixture model for a continuous variable stratified by a binary class label. These methods enable privacy-preserving joint estimation when individual-level data are unavailable. The approaches are detailed in Shang, Tsao, and Zhang (2025a) <doi:10.48550/arXiv.2505.03995> and Shang, Tsao, and Zhang (2025b) <doi:10.48550/arXiv.2508.02057>. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R (≥ 3.5.0) |
Imports: | stats |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-10-16 06:10:26 UTC; shanglongwen |
Author: | Longwen Shang [aut, cre], Min Tsao [aut], Xuekui Zhang [aut] |
Maintainer: | Longwen Shang <shanglongwen0918@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-10-16 20:20:17 UTC |
Example Data: Binary Variables
Description
Simulated dataset for testing the cor_bin()
function.
Usage
data(bin_example)
Format
A data frame with 3 columns:
- ni
Sample size per study
- xi
Count of first binary variable
- yi
Count of second binary variable
Example Data: Continuous Variables
Description
Simulated dataset for testing the cor_cont()
function.
Usage
data(cont_example)
Format
A data frame with 5 columns:
- Sample_Size
Sample size for each study.
- Mean_X
Sample mean of variable X.
- Mean_Y
Sample mean of variable Y.
- Variance_X
Sample variance of variable X.
- Variance_Y
Sample variance of variable Y.
Estimate the Joint Distribution of Two Binary Variables from Marginal Summaries
Description
Performs maximum likelihood estimation (MLE) of the joint distribution of two binary variables using only marginal summary data from multiple studies.
Usage
cor_bin(ni, xi, yi, ci_method = c("none", "normal", "lr"))
Arguments
ni |
Numeric vector. Sample sizes for each dataset. |
xi |
Numeric vector. Count of observations where variable 1 equals 1. |
yi |
Numeric vector. Count of observations where variable 2 equals 1. |
ci_method |
Character string. Method for confidence interval computation.
Options are |
Value
A named list with point estimates, variance, standard error, and confidence interval (if requested).
- p1_hat
Estimated marginal probability for variable 1.
- p2_hat
Estimated marginal probability for variable 2.
- p11_hat
Estimated joint probability.
- var_hat
Estimated variance of
p11_hat
.- sd_hat
Standard error of
p11_hat
.- ci
Confidence interval for
p11_hat
, if requested.
Examples
data(bin_example)
cor_bin(bin_example$ni, bin_example$xi, bin_example$yi, ci_method = "lr")
Estimate the Bivariate Normal Distribution from Marginal Summaries
Description
Estimate the correlation coefficient \rho
(and marginal means / SDs)
of two normally-distributed variables using summary-level data from
multiple independent studies.
Usage
cor_cont(
n,
xbar,
ybar,
s2x = NULL,
s2y = NULL,
method = c("proposed", "weighted"),
ci_method = c("none", "normal", "lr")
)
Arguments
n |
Numeric vector. Sample size of each study. |
xbar , ybar |
Numeric vectors. Sample means of the two variables. |
s2x , s2y |
Numeric vectors. Sample variances; required for |
method |
Character. |
ci_method |
Confidence interval type: |
Value
A list with elements
-
mu_x, mu_y
: estimated marginal means -
sigma_x, sigma_y
: estimated SDs -
rho
: estimated correlation -
se
: standard error ofrho
(proposed only) -
ci
: confidence interval forrho
(if requested)
Examples
data(cont_example)
# Example with full summaries
cor_cont(cont_example$Sample_Size, cont_example$Mean_X, cont_example$Mean_Y,
cont_example$Variance_X, cont_example$Variance_Y, method = "proposed", ci_method = "lr")
# Only means + n, weighted mean method
cor_cont(cont_example$Sample_Size, cont_example$Mean_X, cont_example$Mean_Y, method = "weighted")
Estimate Parameters in a Two-Component Gaussian Mixture Using Study-Level Summaries
Description
Estimates group-specific means and standard deviations (\mu_1, \mu_0, \sigma_1, \sigma_0)
in a two-component
normal mixture model based on aggregate data across multiple studies. The continuous variable X
is assumed to follow a Gaussian mixture conditional on a binary group indicator Y \in \{0,1\}
,
with each study reporting only summary-level statistics.
Usage
est_mixture(ni, xbar, mi, s2 = NULL, method = c("gmm", "naive"))
Arguments
ni |
Integer vector of sample sizes per study. |
xbar |
Numeric vector of sample means per study. |
mi |
Integer vector of group 1 counts per study. |
s2 |
Numeric vector of sample variances per study. Required if |
method |
Estimation method to use. One of |
Details
#' Two estimation methods are available:
-
"naive": Likelihood-based estimator using only sample means.
-
"gmm": Generalized method of moments (GMM) estimator using sample means and variances.
Value
A named list containing:
mu1_hat, mu0_hat
Estimated means of the two groups.
sigma1_hat, sigma0_hat
Estimated standard deviations.
se
Standard errors of the parameter estimates (NA if
method = "naive"
).ci
List of 95% confidence intervals for each parameter (NULL if
method = "naive"
).method
A character string indicating the method used.
Examples
# Load example dataset included in the package
data(mixture_example)
# Estimate using GMM (recommended) with full summary statistics
est_mixture(
ni = mixture_example$ni,
xbar = mixture_example$xbar,
s2 = mixture_example$s2,
mi = mixture_example$mi,
method = "gmm"
)
# Estimate using naive likelihood method (only means used)
est_mixture(
ni = mixture_example$ni,
xbar = mixture_example$xbar,
mi = mixture_example$mi,
method = "naive"
)
Example Data: Mixture Model Summaries
Description
Simulated dataset for testing the est_mixture()
function.
Each row corresponds to one study providing summary-level data
from a two-component normal mixture.
Usage
data(mixture_example)
Format
A data frame with 4 columns:
- ni
Sample size for each study.
- mi
Count of group 1 individuals in each study.
- xbar
Sample mean of the outcome variable.
- s2
Sample variance of the outcome variable.