| Type: | Package | 
| Title: | Privacy-Preserving Distributed Algorithms | 
| Version: | 1.2.8 | 
| Date: | 2025-03-10 | 
| Description: | A collection of privacy-preserving distributed algorithms for conducting multi-site data analyses. The regression analyses can be linear regression for continuous outcome, logistic regression for binary outcome, Cox proportional hazard regression for time-to event outcome, Poisson regression for count outcome, or multi-categorical regression for nominal or ordinal outcome. The PDA algorithm runs on a lead site and only requires summary statistics from collaborating sites, with one or few iterations. The package can be used together with the online system (https://pda-ota.pdamethods.org/) for safe and convenient collaboration. For more information, please visit our software websites: https://github.com/Penncil/pda, and https://pdamethods.org/. | 
| Maintainer: | Yiwen Lu <yiwenlu@sas.upenn.edu> | 
| License: | Apache License 2.0 | 
| Suggests: | imager, lme4 | 
| Depends: | R (≥ 4.1.0) | 
| Imports: | Rcpp (≥ 0.12.19), stats, httr, rvest, jsonlite, data.table, survival, minqa, glmnet, MASS, numDeriv, metafor, ordinal, plyr | 
| LinkingTo: | Rcpp, RcppArmadillo | 
| RoxygenNote: | 7.2.3 | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| NeedsCompilation: | yes | 
| Packaged: | 2025-03-10 14:59:12 UTC; cjiajie | 
| Author: | Chongliang Luo [aut], Rui Duan [aut], Mackenzie Edmondson [aut], Jiayi Tong [aut], Xiaokang Liu [aut], Kenneth Locke [aut], Yiwen Lu [cre], Yong Chen [aut], Penn Computing Inference Learning (PennCIL) lab [cph] | 
| Repository: | CRAN | 
| Date/Publication: | 2025-03-10 15:30:01 UTC | 
ADAP derivatives
Description
ADAP derivatives
Usage
ADAP.derive(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
list(site=config$site_id, site_size = nrow(ipdata), logL_D1=logL_D1, logL_D2=logL_D2)
ADAP surrogate estimation
Description
ADAP surrogate estimation
Usage
ADAP.estimate(ipdata,control,config)
Arguments
| ipdata | local data in data frame | 
| control | PDA control | 
| config | cloud configuration | 
Details
step-3: construct and solve surrogate objective function at the master/lead site
Value
list(btilde = sol$par, Htilde = sol$hessian, site=control$mysite, site_size=nrow(ipdata))
ADAP initialize
Description
ADAP initialize
Usage
ADAP.initialize(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
init
ADAP simulated data
Description
A simulated data set for ADAP demonstration
Usage
ADAP_data
Format
A list containing the following elements:
- sites
- site id, 300 'site1', 300 'site2', 300 'site3' 
- status
- binary outcome of length 900 
- x
- 900 by 49 matrix generated by standard normal distribution, representing the covariates 
PDA DLM estimation
Description
PDA DLM estimation
Usage
DLM.estimate(ipdata=NULL,control,config)
Arguments
| ipdata | no need | 
| control | PDA control | 
| config | cloud configuration | 
Details
DLM estimation: (1) Linear model, (2) Linear model with fixed effects, (3) Linear model with random effects (Linear mixed model)
Value
list(bhat, sebhat, sigmahat, uhat, seuhat)
DLM initialize
Description
DLM initialize
Usage
DLM.initialize(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
init
References
Yixin Chen, et al. (2006) Regression cubes with lossless compression and aggregation. 
IEEE Transactions on Knowledge and Data Engineering, 18(12), pp.1585-1599. 
(DLMM) Chongliang Luo, et al. (2020) Lossless Distributed Linear Mixed Model with Application to Integration of Heterogeneous Healthcare Data.  
medRxiv, doi:10.1101/2020.11.16.20230730. 
DPQL derive
Description
DPQL derive
Usage
DPQL.derive(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Details
This step calculated the intermediate aggregated data (XtWX, XtWY, and YtWY) for each site. May need to be iterated several times until prespecified rounds are met.
Value
list(SiX, SiXY, SiY, ni)
References
Chongliang Luo, et al. (2021) dPQL: a lossless distributed algorithm for generalized linear mixed model 
with application to privacy-preserving hospital profiling. medRxiv, doi:10.1101/2021.05.03.21256561. 
Chongliang Luo, et al. (2020) Lossless Distributed Linear Mixed Model with Application to Integration of Heterogeneous Healthcare Data.  
medRxiv, doi:10.1101/2020.11.16.20230730. 
PDA DPQL estimation
Description
PDA DPQL estimation
Usage
DPQL.estimate(ipdata=NULL,control,config)
Arguments
| ipdata | no need | 
| control | PDA control | 
| config | cloud configuration | 
Details
DPQL estimation: (iterative) weighted DLMM using AD from all sites
Value
list(risk_factor, risk_factor_heterogeneity, bhat, sebhat, uhat, seuhat, Vhat)
References
Chongliang Luo, et al. (2021) dPQL: a lossless distributed algorithm for generalized linear mixed model 
with application to privacy-preserving hospital profiling. medRxiv, doi:10.1101/2021.05.03.21256561. 
Chongliang Luo, et al. (2020) Lossless Distributed Linear Mixed Model with Application to Integration of Heterogeneous Healthcare Data.  
medRxiv, doi:10.1101/2020.11.16.20230730. 
DPQL initialize
Description
DPQL initialize
Usage
DPQL.initialize(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Details
To initialize, fit glm at each individual site and send the estimated effect size and variances to the lead site. This step may be optional if we just use zero's as initial effect sizes to start the PQL algorithm.
Value
init
Length of Stay data
Description
A simulated data set of hospitalization Length of Stay (LOS) from 3 sites
Usage
LOS
Format
A data frame with 1000 rows and 5 variables:
- site
- site id, 500 'site1', 400 'site2' and 100 'site3' 
- age
- 3 categories, 'young', 'middle', and 'old' 
- sex
- 2 categories, 'M' for male and 'F' for female 
- lab
- lab test results, continuous value ranging from 0 to 100 
- los
- LOS in days, ranging from 1 tp 28. Treated as continuous outcome in DLM 
Generate pda UWZ derivatives
Description
Generate pda UWZ derivatives
Usage
ODAC.derive(ipdata, control, config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Details
Calculate and broadcast 1st and 2nd order derivative at initial bbar for ODAC, this requires 2 substeps: 1st calculate summary stats (U, W, Z), 2nd calculate derivatives (logL_D1, logL_D2)
Value
list(T_all=T_all, b_meta=b_meta, site=control$mysite, site_size = nrow(ipdata), U=U, W=W, Z=Z, logL_D1=logL_D1, logL_D2=logL_D2)
Generate pda UWZ summary statistics before calculating derivatives
Description
Generate pda UWZ summary statistics before calculating derivatives
Usage
ODAC.deriveUWZ(ipdata, control, config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
list(T_all=T_all, b_meta=b_meta, site=control$mysite, site_size = nrow(ipdata), U=U, W=W, Z=Z, logL_D1=logL_D1, logL_D2=logL_D2)
PDA surrogate estimation
Description
PDA surrogate estimation
Usage
ODAC.estimate(ipdata, control, config)
Arguments
| ipdata | local data in data frame | 
| control | pda control | 
| config | cloud config | 
Details
step-4: construct and solve surrogate logL at the master/lead site
Value
list(btilde = sol$par, Htilde = sol$hessian, site=control$mysite, site_size=nrow(ipdata))
ODAC initialize
Description
ODAC initialize
Usage
ODAC.initialize(ipdata, control, config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
list(T_i = T_i, bhat_i = fit_i$coef, Vhat_i = summary(fit_i)$coef[,2]^2, site=control$mysite, site_size= nrow(ipdata))
References
Rui Duan, et al. "Learning from local to global: An efficient distributed algorithm for modeling time-to-event data". Journal of the American Medical Informatics Association, 2020, https://doi.org/10.1093/jamia/ocaa044 Chongliang Luo, et al. "ODACH: A One-shot Distributed Algorithm for Cox model with Heterogeneous Multi-center Data". medRxiv, 2021, https://doi.org/10.1101/2021.04.18.21255694
PDA synthesize surrogate estimates from all sites, optional
Description
PDA synthesize surrogate estimates from all sites, optional
Usage
ODAC.synthesize(ipdata, control, config)
Arguments
| ipdata | local data in data frame | 
| control | pda control | 
| config | cloud config | 
Details
Optional step-4: synthesize all the surrogate est btilde_i from each site, if step-3 from all sites is broadcasted
Value
list(btilde=btilde, Vtilde=Vtilde)
ODACAT derivatives
Description
ODACAT derivatives
Usage
ODACAT.derive(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
list(site=config$site_id, site_size = nrow(ipdata), logL_D1=logL_D1, logL_D2=logL_D2)
PDA surrogate estimation
Description
PDA surrogate estimation
Usage
ODACAT.estimate(ipdata,control,config)
Arguments
| ipdata | local data in data frame | 
| control | PDA control | 
| config | cloud configuration | 
Details
step-3: construct and solve surrogate logL at the master/lead site
Value
list(btilde = sol$par, Htilde = sol$hessian, site=control$mysite, site_size=nrow(ipdata))
ODACAT initialize
Description
ODACAT initialize
Usage
ODACAT.initialize(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
init
PDA synthesize surrogate estimates from all sites, optional
Description
PDA synthesize surrogate estimates from all sites, optional
Usage
ODACAT.synthesize(ipdata,control,config)
Arguments
| ipdata | local data in data frame | 
| control | pda control | 
| config | pda cloud configuration | 
Details
Optional step-4: synthesize all the surrogate est btilde from each site, if step-3 from all sites is broadcasted
Value
list(btilde=btilde, Vtilde=Vtilde)
ODACATH derivatives
Description
ODACATH derivatives
Usage
ODACATH.derive(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
list(site=config$site_id, site_size = n, S_site=S_site, eta=eta_mat[site,])
PDA surrogate estimation
Description
PDA surrogate estimation
Usage
ODACATH.estimate(ipdata,control,config)
Arguments
| ipdata | local data in data frame | 
| control | PDA control | 
| config | cloud configuration | 
Details
step-3: construct and solve surrogate efficient score at the master/lead site
Value
list(btilde=betanew, btilde.se=beta_SE,eta_mat=eta_mat,eta_mat_theta=NULL,site=config$site_id, site_size=n_site)
ODACATH initialize
Description
ODACATH initialize
Usage
ODACATH.initialize(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
init
PDA synthesize surrogate estimates from all sites, optional
Description
PDA synthesize surrogate estimates from all sites, optional
Usage
ODACATH.synthesize(ipdata,control,config)
Arguments
| ipdata | local data in data frame | 
| control | pda control | 
| config | pda cloud configuration | 
Details
Optional step-4: synthesize all the surrogate est btilde from each site, if step-3 from all sites is broadcasted
Value
list(btilde=btilde, Vtilde=Vtilde)
ODACAT simulated data
Description
A simulated data set for ODACAT demonstration
Usage
ODACAT_nominal
Format
A data frame with 300 rows and 5 variables:
- id.site
- site id, 102 'site1', 100 'site2', 98 'site3' 
- outcome
- 3-category outcome, possible values are 1,2,3. Category 3 will be used as reference 
- X1
- the first covariate, continuous 
- X2
- the second covariate, binary 
- X3
- the third covariate, binary 
ODACAT simulated data
Description
A simulated data set for ODACAT demonstration
Usage
ODACAT_ordinal
Format
A data frame with 300 rows and 5 variables:
- id.site
- site id, 105 'site1', 105 'site2', 90 'site3' 
- outcome
- 3-category outcome, possible values are 1,2,3. Category 3 will be used as reference 
- X1
- the first covariate, continuous 
- X2
- the second covariate, binary 
- X3
- the third covariate, binary 
ODAH derivatives
Description
ODAH derivatives
Usage
ODAH.derive(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
derivatives list(site = config$site_id, site_size = nrow(ipdata), logL_D1_zero = logL_D1_zero, logL_D1_count = logL_D1_count, logL_D2_zero = logL_D2_zero, logL_D2_count = logL_D2_count)
PDA surrogate estimation
Description
PDA surrogate estimation
Usage
ODAH.estimate(ipdata,control,config)
Arguments
| ipdata | local data in a list(ipdata, X_count, X_zero) | 
| control | PDA control | 
| config | cloud configuration | 
Details
construct and solve surrogate logL at the master/lead site
Value
list(btilde = sol$par, Htilde = sol$hessian, site=control$mysite, site_size=nrow(ipdata))
ODAH initialize
Description
ODAH initialize
Usage
ODAH.initialize(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
init
References
TBD
ODAL derivatives
Description
ODAL derivatives
Usage
ODAL.derive(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
list(site=config$site_id, site_size = nrow(ipdata), logL_D1=logL_D1, logL_D2=logL_D2)
PDA surrogate estimation
Description
PDA surrogate estimation
Usage
ODAL.estimate(ipdata,control,config)
Arguments
| ipdata | local data in data frame | 
| control | PDA control | 
| config | cloud configuration | 
Details
step-3: construct and solve surrogate logL at the master/lead site
Value
list(btilde = sol$par, Htilde = sol$hessian, site=control$mysite, site_size=nrow(ipdata))
ODAL initialize
Description
ODAL initialize
Usage
ODAL.initialize(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
init
References
Rui Duan, et al. "Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm". Journal of the American Medical Informatics Association, 2020, https://doi.org/10.1093/jamia/ocz199
PDA synthesize surrogate estimates from all sites, optional
Description
PDA synthesize surrogate estimates from all sites, optional
Usage
ODAL.synthesize(ipdata,control,config)
Arguments
| ipdata | local data in data frame | 
| control | pda control | 
| config | pda cloud configuration | 
Details
Optional step-4: synthesize all the surrogate est btilde_i from each site, if step-3 from all sites is broadcasted
Value
list(btilde=btilde, Vtilde=Vtilde)
ODAP derivatives
Description
ODAP derivatives
Usage
ODAP.derive(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
derivatives list(site = config$site_id, site_size = nrow(ipdata), logL_D1 = logL_D1, logL_D2 = logL_D2)
PDA surrogate estimation
Description
PDA surrogate estimation
Usage
ODAP.estimate(ipdata,control,config)
Arguments
| ipdata | local data in data frame (generated in  | 
| control | PDA control | 
| config | cloud configuration | 
Details
construct and solve surrogate logL at the master/lead site
Value
list(btilde = sol$par, Htilde = sol$hessian, site=control$mysite, site_size=nrow(ipdata))
ODAP initialize
Description
ODAP initialize
Usage
ODAP.initialize(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
init
References
TBD
ODAPB derivatives
Description
ODAPB derivatives
Usage
ODAPB.derive(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
derivatives list(site = config$site_id, site_size = nrow(ipdata), logL_D1 = logL_D1, logL_D2 = logL_D2)
PDA surrogate estimation
Description
PDA surrogate estimation
Usage
ODAPB.estimate(ipdata,control,config)
Arguments
| ipdata | local data in data frame (generated in  | 
| control | PDA control | 
| config | cloud configuration | 
Details
construct and solve surrogate logL at the master/lead site
Value
list(btilde = sol$par, Htilde = sol$hessian, site=control$mysite, site_size=nrow(ipdata))
ODAPB initialize
Description
ODAPB initialize
Usage
ODAPB.initialize(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
init
References
TBD
COVID-19 LOS and mortality data
Description
A simulated data set of hospitalization Length of Stay (LOS) and mortality from 6 sites
Usage
covid
Format
A data frame with 2100 rows and 6 variables:
- site
- site id, 600 'site1', 500 'site2', 400 'site3', 300 'site4', 200 'site5', 100 'site6' 
- age
- continuous age in year, min 3 max 97 
- sex
- 2 categories, '1' for male and '0' for female 
- lab
- lab test results, continuous value ranging from 2.3 to 97.4 
- los
- LOS in days, ranging from 1 to 29 
- death
- mortality status, '1' for death and '0' for alive. 
CrabSatellites data
Description
A data set modified from the CrabSatellites data in countreg package (see demo(ODAH)).
Usage
cs
Format
A data frame containing 173 observations on 4 variables.
- site
- Simulated site id, 85 'site1' and 88 'site2'. 
- satellites
- Number of satellites. Treated as (zero-inflated) count outcome in ODAH 
- width
- Carapace width (cm). 
- weight
- Weight (kg). 
Source
https://rdrr.io/rforge/countreg/man/CrabSatellites.html
dGEM hospital-specific effect derivation
Description
dGEM hospital-specific effect derivation
Usage
dGEM.derive(ipdata,control,config,hosdata)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
| hosdata | hospital-level data | 
Value
hospital_effect
dGEM standardized event rate estimation
Description
dGEM standardized event rate estimation
Usage
dGEM.estimate(ipdata,control,config)
Arguments
| ipdata | local data in data frame | 
| control | PDA control | 
| config | cloud configuration | 
Details
step-3:
Value
event rate
dGEM initialize
Description
dGEM initialize
Usage
dGEM.initialize(ipdata,control,config)
Arguments
| ipdata | individual participant data | 
| control | pda control data | 
| config | local site configuration | 
Value
init
References
NA
PDA dGEM synthesize
Description
PDA dGEM synthesize
Usage
dGEM.synthesize(control,config)
Arguments
| control | pda control | 
| config | pda cloud configuration | 
Details
Synthesis to get the standardized mortality rate
Value
list(final_event_rate=final_event_rate)
gather cloud settings into a list
Description
gather cloud settings into a list
Usage
getCloudConfig(site_id,dir,uri,secret)
Arguments
| site_id | site identifier | 
| dir | shared directory path if flat files | 
| uri | web uri if web service | 
| secret | web token if web service | 
Value
A list of cloud parameters: site_id, secret and uri
See Also
pda
Lung cancer survival time data
Description
A data set modified from the lung data in survival package (see demo(ODAC)).
Usage
lung2
Format
A data frame with 228 rows and 5 variables:
- site
- simulated site id, 86 'site1', 83 'site2' and 59 'site3' 
- time
- survival time in days 
- status
- censoring status 0=censored, 1=dead 
- age
- age in years 
- sex
- 1 for female and 0 for male 
Source
https://CRAN.R-project.org/package=survival
A flexible version of MASS::glmmPQL
Description
A flexible version of MASS::glmmPQL
Usage
myglmmPQL(formula.glm, formula, offset=NULL, family, data, fixef.init = NULL, 
                 weights=NULL, REML=T, niter=10, verbose=T)
Arguments
| formula.glm | formula used to fit  | 
| formula | formula used to fit iterative  | 
| offset | 
 | 
| family | 
 | 
| data | 
 | 
| fixef.init | initial fixed effects estimates, set to zeros if NULL | 
| weights | 
 | 
| REML | 
 | 
| niter | 
 | 
| verbose | 
 | 
Details
Use lme4::lmer instead of nlme::varFixed in PQL iteration to allow REML
Value
An object wiht the same format as lmer.
PDA: Privacy-preserving Distributed Algorithm
Description
Fit Privacy-preserving Distributed Algorithms for linear, logistic, Poisson and Cox PH regression with possible heterogeneous data across sites.
Usage
pda(ipdata,site_id,control,dir,uri,secret,hosdata)
Arguments
| ipdata | Local IPD data in data frame, should include at least one column for the outcome and one column for the covariates | 
| site_id | Character site name | 
| control | pda control data | 
| dir | directory for shared flat file cloud | 
| uri | Universal Resource Identifier for this run | 
| secret | password to authenticate as site_id on uri | 
| hosdata | hospital-level data, should include the same name as defined in the control file | 
Value
control
control
References
Michael I. Jordan, Jason D. Lee & Yun Yang (2019) Communication-Efficient Distributed Statistical Inference, 
Journal of the American Statistical Association, 114:526, 668-681 
 
doi:10.1080/01621459.2018.1429274.
 
(DLM) Yixin Chen, et al. (2006) Regression cubes with lossless compression and aggregation. 
IEEE Transactions on Knowledge and Data Engineering, 18(12), pp.1585-1599. 
(DLMM) Chongliang Luo, et al. (2020) Lossless Distributed Linear Mixed Model with Application to Integration of Heterogeneous Healthcare Data.  
medRxiv, doi:10.1101/2020.11.16.20230730. 
(DPQL) Chongliang Luo, et al. (2021) dPQL: a lossless distributed algorithm for generalized linear mixed model with application to privacy-preserving hospital profiling. 
medRxiv, doi:10.1101/2021.05.03.21256561. 
(ODAL) Rui Duan, et al. (2020) Learning from electronic health records across multiple sites: 
 
A communication-efficient and privacy-preserving distributed algorithm. 
 
Journal of the American Medical Informatics Association, 27.3:376–385,
 doi:10.1093/jamia/ocz199.
 
(ODAC) Rui Duan, et al. (2020) Learning from local to global: An efficient distributed algorithm for modeling time-to-event data. 
Journal of the American Medical Informatics Association, 27.7:1028–1036, 
 
doi:10.1093/jamia/ocaa044. 
(ODACH) Chongliang Luo, et al. (2021) ODACH: A One-shot Distributed Algorithm for Cox model with Heterogeneous Multi-center Data. 
medRxiv, doi:10.1101/2021.04.18.21255694. 
 
(ODAH) Mackenzie J. Edmondson, et al. (2021) An Efficient and Accurate Distributed Learning Algorithm for Modeling Multi-Site Zero-Inflated Count Outcomes. 
medRxiv, pp.2020-12. 
doi:10.1101/2020.12.17.20248194. 
(ADAP) Xiaokang Liu, et al. (2021) ADAP: multisite learning with high-dimensional heterogeneous data via A Distributed Algorithm for Penalized regression. 
(dGEM) Jiayi Tong, et al. (2022) dGEM: Decentralized Generalized Linear Mixed Effects Model 
See Also
pdaPut, pdaList, pdaGet, getCloudConfig and pdaSync.
Examples
require(survival)
require(data.table)
require(pda)
data(lung)
## In the toy example below we aim to analyze the association of lung status with 
## age and sex using logistic regression, data(lung) from 'survival', we randomly 
## assign to 3 sites: 'site1', 'site2', 'site3'. we demonstrate using PDA ODAL can 
## obtain a surrogate estimator that is close to the pooled estimate. We run the 
## example in local directory. In actual collaboration, account/password for pda server 
## will be assigned to the sites at the server https://pda.one.
## Each site can access via web browser to check the communication of the summary stats.
## for more examples, see demo(ODAC) and demo(ODAP)
# Create 3 sites, split the lung data amongst them
sites = c('site1', 'site2', 'site3')
set.seed(42)
lung2 <- lung[,c('status', 'age', 'sex')]
lung2$sex <- lung2$sex - 1
lung2$status <- ifelse(lung2$status == 2, 1, 0)
lung_split <- split(lung2, sample(1:length(sites), nrow(lung), replace=TRUE))
## fit logistic reg using pooled data
fit.pool <- glm(status ~ age + sex, family = 'binomial', data = lung2)
# ############################  STEP 1: initialize  ###############################
control <- list(project_name = 'Lung cancer study',
                step = 'initialize',
                sites = sites,
                heterogeneity = FALSE,
                model = 'ODAL',
                family = 'binomial',
                outcome = "status",
                variables = c('age', 'sex'),
                optim_maxit = 100,
                lead_site = 'site1',
                upload_date = as.character(Sys.time()) )
## run the example in local directory:
## specify your working directory, default is the tempdir
mydir <- tempdir()
## assume lead site1: enter "1" to allow transferring the control file  
pda(site_id = 'site1', control = control, dir = mydir)
## in actual collaboration, account/password for pda server will be assigned, thus:
## Not run: pda(site_id = 'site1', control = control, uri = 'https://pda.one', secret='abc123')
## you can also set your environment variables, and no need to specify them in pda:
## Not run: Sys.setenv(PDA_USER = 'site1', PDA_SECRET = 'abc123', PDA_URI = 'https://pda.one')
## Not run: pda(site_id = 'site1', control = control)
##' assume remote site3: enter "1" to allow tranferring your local estimate 
pda(site_id = 'site3', ipdata = lung_split[[3]], dir=mydir)
##' assume remote site2: enter "1" to allow tranferring your local estimate  
pda(site_id = 'site2', ipdata = lung_split[[2]], dir=mydir)
##' assume lead site1: enter "1" to allow tranferring your local estimate  
##' control.json is also automatically updated
pda(site_id = 'site1', ipdata = lung_split[[1]], dir=mydir)
##' if lead site1 initialized before other sites,
##' lead site1: uncomment to sync the control before STEP 2
## Not run: pda(site_id = 'site1', control = control)
## Not run: config <- getCloudConfig(site_id = 'site1')
## Not run: pdaSync(config)
#' ############################'  STEP 2: derivative  ############################ 
##' assume remote site3: enter "1" to allow tranferring your derivatives  
pda(site_id = 'site3', ipdata = lung_split[[3]], dir=mydir)
##' assume remote site2: enter "1" to allow tranferring your derivatives  
pda(site_id = 'site2', ipdata = lung_split[[2]], dir=mydir)
##' assume lead site1: enter "1" to allow tranferring your derivatives  
pda(site_id = 'site1', ipdata = lung_split[[1]], dir=mydir)
#' ############################'  STEP 3: estimate  ############################ 
##' assume lead site1: enter "1" to allow tranferring the surrogate estimate  
pda(site_id = 'site1', ipdata = lung_split[[1]], dir=mydir)
##' the PDA ODAL is now completed!
##' All the sites can still run their own surrogate estimates and broadcast them.
##' compare the surrogate estimate with the pooled estimate 
config <- getCloudConfig(site_id = 'site1', dir=mydir)
fit.odal <- pdaGet(name = 'site1_estimate', config = config)
cbind(b.pool=fit.pool$coef,
      b.odal=fit.odal$btilde,
      sd.pool=summary(fit.pool)$coef[,2],
      sd.odal=sqrt(diag(solve(fit.odal$Htilde)/nrow(lung2))))
      
## see demo(ODAL) for more optional steps
Function to download json and return as object
Description
Function to download json and return as object
Usage
pdaGet(name,config)
Arguments
| name | of file | 
| config | cloud configuration | 
Value
A list of data objects from the json file on the cloud
See Also
pda
Function to list available objects
Description
Function to list available objects
Usage
pdaList(config)
Arguments
| config | a list of variables for cloud configuration | 
Value
A list of (json) files on the cloud
See Also
pda
Function to upload object to cloud as json
Description
Function to upload object to cloud as json
Usage
pdaPut(obj,name,config)
Arguments
| obj | R object to encode as json and uploaded to cloud | 
| name | of file | 
| config | a list of variables for cloud configuration | 
Value
NONE
See Also
pda
pda control synchronize
Description
update pda control if ready (run by lead)
Usage
pdaSync(config)
Arguments
| config | cloud configuration | 
Value
control
See Also
pda