| Title: | Computations over Distributed Data without Aggregation |
| Maintainer: | Balasubramanian Narasimhan <naras@stat.Stanford.EDU> |
| Version: | 1.3-3 |
| VignetteBuilder: | knitr |
| URL: | http://dx.doi.org/10.18637/jss.v077.i13 |
| Depends: | survival, stats, R (≥ 3.5.0) |
| Imports: | utils, shiny, httr (≥ 1.0.0), digest, jsonlite, stringr, R6 (≥ 2.0), dplyr, rlang, magrittr, homomorpheR, gmp |
| Suggests: | opencpu, knitr, covr, rmarkdown |
| Description: | Implementing algorithms and fitting models when sites (possibly remote) share computation summaries rather than actual data over HTTP with a master R process (using 'opencpu', for example). A stratified Cox model and a singular value decomposition are provided. The former makes direct use of code from the R 'survival' package. (That is, the underlying Cox model code is derived from that in the R 'survival' package.) Sites may provide data via several means: CSV files, Redcap API, etc. An extensible design allows for new methods to be added in the future and includes facilities for local prototyping and testing. Web applications are provided (via 'shiny') for the implemented methods to help in designing and deploying the computations. |
| Copyright: | inst/COPYRIGHTS |
| Encoding: | UTF-8 |
| License: | LGPL-2 | LGPL-2.1 | LGPL-3 [expanded from: LGPL (≥ 2)] |
| RoxygenNote: | 7.2.1 |
| NeedsCompilation: | yes |
| Packaged: | 2022-09-01 18:28:23 UTC; naras |
| Author: | Balasubramanian Narasimhan [aut, cre], Marina Bendersky [aut], Sam Gross [aut], Terry M. Therneau [ctb], Thomas Lumley [ctb] |
| Repository: | CRAN |
| Date/Publication: | 2022-09-01 21:00:02 UTC |
Make an appropriate opencpu URL for a specified function and url prefix for the distcomp package
Description
.makeOpencpuURL returns an appropriate URL to call a function in the distcomp package given the name of the function and a url prefix.
.defnOK returns TRUE or FALSE depending on whether the definition object meets minimimal requirements.
.deSerialize will convert the JSON result of a http response as needed, else the raw content is returned.
Usage
.makeOpencpuURL(fn, urlPrefix, package = "distcomp")
.defnOK(defn)
.deSerialize(q)
Arguments
fn |
is the name of the function in the distcomp package |
urlPrefix |
is the URL of the opencpu server with the distcomp package installed |
defn |
is the definition object passed |
q |
the result of a httr response |
Value
the formatted url as a string
TRUE or FALSE depending on the result
the converted result, if JSON, or the raw content
Examples
distcomp:::.makeOpencpuURL("foo", "http://localhost:9999/ocpu")
distcomp:::.defnOK(data.frame()) ## FALSE
distcomp:::.defnOK(data.frame(id = "ABC", stringsAsFactors=FALSE)) ## TRUE
Create a master object to control CoxWorker worker objects
Description
CoxMaster objects instantiate and run a distributed Cox model
computation fit
Methods
Public methods
Method new()
CoxMaster objects instantiate and run a distributed Cox model
computation fit
Usage
CoxMaster$new(defn, debug = FALSE)
Arguments
defna computation definition
debuga flag for debugging, default
FALSE
Returns
R6 CoxMaster object
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
CoxMaster$kosher()
Returns
TRUE or FALSE
Method logLik()
Return the partial log likelihood on all data for given beta parameter.
Usage
CoxMaster$logLik(beta)
Arguments
betathe parameter vector
Returns
a named list with three components: value contains the value of the
log likelihood, gradient contains the score vector, and hessian contains
the estimated hessian matrix
Method addSite()
Add a url or worker object for a site for participating in the distributed computation. The worker object can be used to avoid complications in debugging remote calls during prototyping.
Usage
CoxMaster$addSite(name, url = NULL, worker = NULL)
Arguments
nameof the site
urlweb url of the site; exactly one of
urlorworkershould be specifiedworkerworker object for the site; exactly one of
urlorworkershould be specified
Method run()
Run the distributed Cox model fit and return the estimates
Usage
CoxMaster$run(control = coxph.control())
Arguments
controlparameters, same as
survival::coxph.control()
Returns
a named list of beta, var, gradient, iter, and returnCode
#' @description ' Return the summary of fit as a data frame
Method summary()
Usage
CoxMaster$summary()
Returns
a summary data frame columns for coef,
exp(coef), ' standard error, z-score, and p-value for each
parameter in the model following the same format as the
survival package
Method clone()
The objects of this class are cloneable with this method.
Usage
CoxMaster$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
CoxWorker which generates objects matched to such a master object
R6 class for object to use as a worker with CoxMaster master objects
Description
CoxWorker objects are worker objects at each data site of
a distributed Cox model computation
Methods
Public methods
Method new()
Create a new CoxWorker object.
Usage
CoxWorker$new(defn, data, stateful = TRUE)
Arguments
defnthe computation definition
datathe local data
statefula boolean flag indicating if state needs to be preserved between REST calls
Returns
a new CoxWorker object
Method getP()
Return the dimension of the parameter vector.
Usage
CoxWorker$getP(...)
Arguments
...other args ignored
Returns
the dimension of the parameter vector
Method getStateful()
Return the stateful status of the object.
Usage
CoxWorker$getStateful()
Returns
the stateful flag, TRUE or FALSE
Method logLik()
Return the partial log likelihood on local data for given beta parameter.
Usage
CoxWorker$logLik(beta, ...)
Arguments
betathe parameter vector
...further arguments, currently unused
Returns
a named list with three components: value contains the value of the
log likelihood, gradient contains the score vector, and hessian contains
the estimated hessian matrix
Method var()
Return the variance of estimate for given beta parameter on local data.
Usage
CoxWorker$var(beta, ...)
Arguments
betathe parameter vector
...further arguments, currently unused
Returns
variance vector
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
CoxWorker$kosher()
Returns
TRUE or FALSE
Method clone()
The objects of this class are cloneable with this method.
Usage
CoxWorker$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
CoxMaster which goes hand-in-hand with this object
Create a HEMaster process for use in a distributed homomorphic encrypted (HE) computation
Description
HEMaster objects run a distributed computation based
upon a definition file that encapsulates all information
necessary to perform a computation. A master makes use of two
non-cooperating parties which communicate with sites that
perform the actual computations using local data.
Public fields
dendenominator for rational arithmetic
den_bitsnumber of bits for denominator for rational arithmetic
Methods
Public methods
Method new()
Create a HEMaster object to run homomorphic encrypted computation
Usage
HEMaster$new(defn)
Arguments
defnthe homomorphic computation definition
Returns
a HEMaster object
Method getNC_party()
Return a list of noncooperating parties (NCPs)
Usage
HEMaster$getNC_party()
Returns
a named list of length 2 of noncooperating party information
Method getPubkey()
Return the public key from the public private key pair
Usage
HEMaster$getPubkey()
Returns
an R6 Pubkey object
Method addNCP()
Add a noncooperating party to this master either using a url or an object in session for prototyping
Usage
HEMaster$addNCP(ncp_defn, url = NULL, ncpWorker = NULL)
Arguments
ncp_defnthe definition of the NCP
urlthe url for the NCP; only one of url and ncpWorker should be non-null
ncpWorkeran instantiated worker object; only one of url and ncpWorker should be non-null
Method run()
Run a distributed homomorphic encrypted computation and return the result
Usage
HEMaster$run(debug = FALSE)
Arguments
debuga flag for debugging, default
FALSE
Returns
the result of the distributed homomorphic computation
Method clone()
The objects of this class are cloneable with this method.
Usage
HEMaster$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
Create a homomorphic computation query count master object to employ worker objects generated by HEQueryCountWorker()
Description
HEQueryCountMaster objects instantiate and run a distributed homomorphic query count computation; they're instantiated by non-cooperating parties (NCPs)
Super class
distcomp::QueryCountMaster -> HEQueryCountMaster
Public fields
pubkeythe master's public key visible to everyone
pubkey_bitsthe number of bits in the public key (used for reconstructing public key remotely by serializing to character)
pubkey_nthe
nfor the public key used for reconstructing public key remotelydenthe denominator for rational arithmetic
den_bitsthe number of bits in the denominator used for reconstructing denominator remotely
Methods
Public methods
Inherited methods
Method new()
Create a new HEQueryCountMaster object.
Usage
HEQueryCountMaster$new(defn, partyNumber, debug = FALSE)
Arguments
defnthe computation definition
partyNumberthe party number of the NCP that this object belongs to (1 or 2)
debuga flag for debugging, default
FALSE
Returns
a new HEQueryCountMaster object
Method setParams()
Set some parameters of the HEQueryCountMaster object for homomorphic computations
Usage
HEQueryCountMaster$setParams(pubkey_bits, pubkey_n, den_bits)
Arguments
pubkey_bitsthe number of bits in public key
pubkey_nthe
nfor the public keyden_bitsthe number of bits in the denominator (power of 2) used in rational approximations
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
HEQueryCountMaster$kosher()
Returns
TRUE or FALSE
Method queryCount()
Run the distributed query count, associate it with a token, and return the result
Usage
HEQueryCountMaster$queryCount(token)
Arguments
tokena token to use as key
Returns
the partial result as a list of encrypted items with components int and frac
Method cleanup()
Cleanup the instance objects
Usage
HEQueryCountMaster$cleanup()
Method run()
Run the homomorphic encrypted distributed query count computation
Usage
HEQueryCountMaster$run(token)
Arguments
tokena token to use as key
Returns
the partial result as a list of encrypted items with components int and frac
Method clone()
The objects of this class are cloneable with this method.
Usage
HEQueryCountMaster$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
HEQueryCountWorker() which goes hand-in-hand with this object
Create a homomorphic computation query count worker object for use with master objects generated by HEQueryCountMaster()
Description
HEQueryCountWorker objects are worker objects at each site of
a distributed query count model computation using homomorphic encryption
Super class
distcomp::QueryCountWorker -> HEQueryCountWorker
Public fields
pubkeythe master's public key visible to everyone
denthe denominator for rational arithmetic
Methods
Public methods
Inherited methods
Method new()
Create a new HEQueryMaster object.
Usage
HEQueryCountWorker$new( defn, data, pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL )
Arguments
defnthe computation definition
datathe data which is usually the list of sites
pubkey_bitsthe number of bits in public key
pubkey_nthe
nfor the public keyden_bitsthe number of bits in the denominator (power of 2) used in rational approximations
Returns
a new HEQueryMaster object
Method setParams()
Set some parameters for homomorphic computations
Usage
HEQueryCountWorker$setParams(pubkey_bits, pubkey_n, den_bits)
Arguments
pubkey_bitsthe number of bits in public key
pubkey_nthe
nfor the public keyden_bitsthe number of bits in the denominator (power of 2) used in rational approximations
Method queryCount()
Run the query count on local data and return the appropriate encrypted result to the party
Usage
HEQueryCountWorker$queryCount(partyNumber, token)
Arguments
partyNumberthe NCP party number (1 or 2)
tokena token to use for identifying parts of the same computation for NCP1 and NCP2
Returns
the count as a list of encrypted items with components int and frac
Method clone()
The objects of this class are cloneable with this method.
Usage
HEQueryCountWorker$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
HEQueryCountMaster() which goes hand-in-hand with this object
R6 object to use as non-cooperating party in a distributed homomorphic computation
Description
NCP objects are worker objects that separate a
master process from communicating directly with the worker
processes. Typically two such are needed for a distributed
homomorphic computation. A master process can communicate with
NCP objects and the NCP objects can communicate
with worker processes. However, the two NCP objects,
designated by numbers 1 and 2, are non-cooperating in the sense
that they don't communicate with each other and are isolated
from each other.
Public fields
pubkeythe master's public key visible to everyone
pubkey_bitsthe number of bits in the public key (used for reconstructing public key remotely by serializing to character)
pubkey_nthe
nfor the public key used for reconstructing public key remotelydenthe denominator for rational arithmetic
den_bitsthe number of bits in the denominator used for reconstructing denominator remotely
Methods
Public methods
Method new()
Create a new NCP object.
Usage
NCP$new( ncp_defn, comp_defn, sites = list(), pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL )
Arguments
ncp_defnthe NCP definition; see example
comp_defnthe computation definition
siteslist of sites
pubkey_bitsthe number of bits in public key
pubkey_nthe
nfor the public keyden_bitsthe number of bits in the denominator (power of 2) used in rational approximations
Returns
a new NCP object
Method getStateful()
Retrieve the value of the stateful field
Usage
NCP$getStateful()
Method setParams()
Set some parameters of the NCP object for homomorphic computations
Usage
NCP$setParams(pubkey_bits, pubkey_n, den_bits)
Arguments
pubkey_bitsthe number of bits in public key
pubkey_nthe
nfor the public keyden_bitsthe number of bits in the denominator (power of 2) used in rational approximations
Method getSites()
Retrieve the value of the private sites field
Usage
NCP$getSites()
Method setSites()
Set the value of the private sites field
Usage
NCP$setSites(sites)
Arguments
sitesthe list of sites
Method addSite()
Add a url or worker object for a site for participating in the distributed computation. The worker object can be used to avoid complications in debugging remote calls during prototyping.
Usage
NCP$addSite(name, url = NULL, worker = NULL)
Arguments
nameof the site
urlweb url of the site; exactly one of
urlorworkershould be specifiedworkerworker object for the site; exactly one of
urlorworkershould be specified
Method cleanupInstance()
Clean up by destroying instance objects created in workspace.
Usage
NCP$cleanupInstance(token)
Arguments
tokenthe token for the instance
Method run()
Run the distributed homomorphic computation
Usage
NCP$run(token)
Arguments
tokena unique token for the run, used to ensure that correct parts of cached results are returned appropriately
Returns
the result of the computation
Method clone()
The objects of this class are cloneable with this method.
Usage
NCP$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Create a master object to control worker objects generated by QueryCountWorker()
Description
QueryCountMaster objects instantiate and run a distributed query count computation
Methods
Public methods
Method new()
Create a new QueryCountMaster object.
Usage
QueryCountMaster$new(defn, debug = FALSE)
Arguments
defnthe computation definition
debuga flag for debugging, default
FALSE
Returns
a new QueryCountMaster object
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
QueryCountMaster$kosher()
Returns
TRUE or FALSE
Method queryCount()
Run the distributed query count and return the result
Usage
QueryCountMaster$queryCount()
Returns
the count
Method getSites()
Retrieve the value of the private sites field
Usage
QueryCountMaster$getSites()
Method addSite()
Add a url or worker object for a site for participating in the distributed computation. The worker object can be used to avoid complications in debugging remote calls during prototyping.
Usage
QueryCountMaster$addSite(name, url = NULL, worker = NULL)
Arguments
nameof the site
urlweb url of the site; exactly one of
urlorworkershould be specifiedworkerworker object for the site; exactly one of
urlorworkershould be specified
Method run()
Run the distributed query count
Usage
QueryCountMaster$run()
Returns
the count
Method clone()
The objects of this class are cloneable with this method.
Usage
QueryCountMaster$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
QueryCountWorker() which goes hand-in-hand with this object
R6 worker object for use as a worker with master objects generated by QueryCountMaster()
Description
QueryCountWorker objects are worker objects at each site of
a distributed QueryCount model computation
Methods
Public methods
Method new()
Create a new QueryCountWorker object.
Usage
QueryCountWorker$new(defn, data, stateful = FALSE)
Arguments
defnthe computation definition
datathe local data
statefulthe statefulness flag, default
FALSE
Returns
a new QueryCountWorker object
Method getStateful()
Retrieve the value of the stateful field
Usage
QueryCountWorker$getStateful()
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
QueryCountWorker$kosher()
Returns
TRUE or FALSE
Method queryCount()
Return the query count on the local data
Usage
QueryCountWorker$queryCount()
Method clone()
The objects of this class are cloneable with this method.
Usage
QueryCountWorker$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
QueryCountMaster() which goes hand-in-hand with this object
R6 class for SVD master object to control worker objects generated by SVDWorker()
Description
SVDMaster objects instantiate and run a distributed SVD computation
Methods
Public methods
Method new()
SVDMaster objects instantiate and run a distributed SVD computation
Usage
SVDMaster$new(defn, debug = FALSE)
Arguments
defna computation definition
debuga flag for debugging, default
FALSE
Returns
R6 SVDMaster object
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
SVDMaster$kosher()
Returns
TRUE or FALSE
Method updateV()
Return an updated value for the V vector, normalized by arg
Usage
SVDMaster$updateV(arg)
Arguments
argthe normalizing value
...other args ignored
Returns
updated V
Method updateU()
Update U and return the updated norm of U
Usage
SVDMaster$updateU(arg)
Arguments
argthe normalizing value
...other args ignored
Returns
updated norm of U
Method fixFit()
Construct the residual matrix using given the V vector and d so far
Usage
SVDMaster$fixFit(v, d)
Arguments
vthe value for
vdthe value for
d
Returns
result
Method reset()
Reset the computation state by initializing work matrix and set up starting values for iterating
Usage
SVDMaster$reset()
Method addSite()
Add a url or worker object for a site for participating in the distributed computation. The worker object can be used to avoid complications in debugging remote calls during prototyping.
Usage
SVDMaster$addSite(name, url = NULL, worker = NULL)
Arguments
nameof the site
urlweb url of the site; exactly one of
urlorworkershould be specifiedworkerworker object for the site; exactly one of
urlorworkershould be specified
Method run()
Run the distributed Cox model fit and return the estimates
Usage
SVDMaster$run(thr = 1e-08, max.iter = 100)
Arguments
thrthe threshold for convergence, default 1e-8
max.iterthe maximum number of iterations, default 100
Returns
a named list of V, d
Method summary()
Return the summary result
Usage
SVDMaster$summary()
Returns
a named list of V, d
Method clone()
The objects of this class are cloneable with this method.
Usage
SVDMaster$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
SVDWorker() which goes hand-in-hand with this object
R6 class for a SVD worker object to use with master objects generated by SVDMaster()
Description
SVDWorker objects are worker objects at each site of a distributed SVD model computation
Methods
Public methods
Method new()
Create a new SVDWorker object.
Usage
SVDWorker$new(defn, data, stateful = TRUE)
Arguments
defnthe computation definition
datathe local
xmatrixstatefula boolean flag indicating if state needs to be preserved between REST calls,
TRUEby default
Returns
a new SVDWorker object
Method reset()
Reset the computation state by initializing work matrix and set up starting values for iterating
Usage
SVDWorker$reset()
Method dimX()
Return the dimensions of the matrix
Usage
SVDWorker$dimX(...)
Arguments
...other args ignored
Returns
the dimension of the matrix
Method updateV()
Return an updated value for the V vector, normalized by arg
Usage
SVDWorker$updateV(arg, ...)
Arguments
argthe normalizing value
...other args ignored
Returns
updated V
Method updateU()
Update U and return the updated norm of U
Usage
SVDWorker$updateU(arg, ...)
Arguments
argthe initial value
...other args ignored
Returns
updated norm of U
Method normU()
Normalize U vector
Usage
SVDWorker$normU(arg, ...)
Arguments
argthe normalizing value
...other args ignored
Returns
TRUE invisibly
Method fixU()
Construct residual matrix using arg
Usage
SVDWorker$fixU(arg, ...)
Arguments
argthe value to use for residualizing
...other args ignored
Method getN()
Getthe number of rows of x matrix
Usage
SVDWorker$getN()
Returns
the number of rows of x matrix
Method getP()
Getthe number of columnsof x matrix
Usage
SVDWorker$getP()
Returns
the number of columns of x matrix
Method getStateful()
Return the stateful status of the object.
Usage
SVDWorker$getStateful()
Returns
the stateful flag, TRUE or FALSE
Method kosher()
Check if inputs and state of object are sane. For future use
Usage
SVDWorker$kosher()
Returns
TRUE or FALSE
Method clone()
The objects of this class are cloneable with this method.
Usage
SVDWorker$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
SVDMaster() which goes hand-in-hand with this object
Return the currently available (implemented) computations
Description
The function availableComputations returns a list
of available computations with various components. The names of this list
(with no spaces) are unique canonical tags that are used throughout the
package to unambiguously refer to the type of computation; web applications
particularly rely on this list to instantiate objects. As more computations
are implemented, this list is augmented.
Usage
availableComputations()
Value
a list with the components corresponding to a computation
desc |
a textual description (25 chars at most) |
definitionApp |
the name of a function that will fire up a shiny webapp for defining the particular computation |
workerApp |
the name of a function that will fire up a shiny webapp for setting up a worker site for the particular computation |
masterApp |
the name of a function that will fire up a shiny webapp for setting up a master for the particular computation |
makeDefinition |
the name of a function that will return a data frame
with appropriate fields needed to define the particular computation assuming
that they are populated in a global variable. This function is used by web
applications to construct a definition object based on inputs specified
by the users. Since the full information is often gathered incrementally by
several web applications, the inputs are set in a global variable and
therefore retrieved here using the function |
makeMaster |
a function that will construct a master object for the computation given the definition and a logical flag indicating if debugging is desired |
makeWorker |
a function that will construct a worker object for that computation given the definition and data |
See Also
Examples
availableComputations()
Return currently implemented data sources
Description
The function availableDataSources returns the
currently implemented data sources such as CSV files, Redcap etc.
Usage
availableDataSources()
Value
a list of named arguments, each of which is another list, with
required fields named desc, a textual description and
requiredPackages
Examples
availableDataSources()
Given the definition identifier of an object, instantiate and store object in workspace
Description
The function createHEWorkerInstance uses a
definition identified by defnId to create the appropriate
object instance for HE computations. The instantiated object is
searched for in the instance path and loaded if already
present, otherwise it is created and assigned the instanceId
and saved under the dataFileName if the latter is specified.
This instantiated object may change state between iterations
when a computation executes
Usage
createHEWorkerInstance(
defnId,
instanceId,
pubkey_bits = NULL,
pubkey_n = NULL,
den_bits = NULL,
dataFileName = NULL
)
Arguments
defnId |
the identifier of an already defined computation |
instanceId |
an indentifier to use for the created instance |
pubkey_bits |
number of bits for public key |
pubkey_n |
the |
den_bits |
the number of bits for the denominator |
dataFileName |
a file name to use for saving the
data. Typically |
Value
TRUE if everything goes well
See Also
Given the definition identifier of an object, instantiate and store object in workspace
Description
This function uses an identifier (defnId) to locate
a stored definition in the workspace to create the appropriate
object instance. The instantiated object is assigned the
instanceId and saved under the dataFileName if the latter is
not NULL. This instantiated object may change state between
iterations when a computation executes
Usage
createNCPInstance(
name,
ncpId,
instanceId,
pubkey_bits,
pubkey_n,
den_bits,
dataFileName = NULL
)
Arguments
name |
identifying the NC party |
ncpId |
the id indicating the NCP definition |
instanceId |
an indentifier to use for the created instance |
pubkey_bits |
the public key number of bits |
pubkey_n |
the pubkey n |
den_bits |
the denominator number of bits for for rational approximations |
dataFileName |
a file name to use for saving the
data. Typically |
Value
TRUE if everything goes well
Given the definition identifier of an object, instantiate and store object in workspace
Description
The function createWorkerInstance uses a definition identified by
defnId to create the appropriate object instance. The instantiated object is assigned
the instanceId and saved under the dataFileName if the latter is specified.
This instantiated object may change state between iterations when a computation executes
Usage
createWorkerInstance(
defnId,
instanceId,
pubkey_bits = NULL,
pubkey_n = NULL,
den_bits = NULL,
dataFileName = NULL
)
Arguments
defnId |
the identifier of an already defined computation |
instanceId |
an indentifier to use for the created instance |
pubkey_bits |
number of bits for public key |
pubkey_n |
the |
den_bits |
the number of bits for the denominator |
dataFileName |
a file name to use for saving the data. Typically |
Value
TRUE if everything goes well
See Also
Functions copied and modified from survival package
Description
The distcomp package makes use of code from the survival package, with the permission of the original authors. This includes R code as well as C code. That is, the underlying Cox model code is derived from that in the R survival package. The original copyrights are retained for these files and the notices preserved. However, these are for internal use and future implementations may change how we use them. In order to avoid confusion and any name collision, the names of these functions have been modified to include a prefix "dc".
Usage
dccoxph(
formula,
data,
weights,
subset,
na.action,
init,
control,
ties = c("efron", "breslow", "exact"),
singular.ok = TRUE,
robust = FALSE,
model = FALSE,
x = FALSE,
y = TRUE,
tt,
method = ties,
...
)
dccoxph.fit(x, y, strata, offset, init, control, weights, method, rownames)
Define a new computation
Description
This function just calls runDistcompApp() with the
parameter "definition"
Usage
defineNewComputation()
Value
the results of running the web application
See Also
Destroy an instance object given its identifier
Description
The function destroyInstanceObject deletes an object associated
with the instanceId. This is typically done after a computation completes and results
have been obtained.
Usage
destroyInstanceObject(instanceId)
Arguments
instanceId |
the id of the object to destroy |
Value
TRUE if everything goes well
See Also
Distributed Computing with R
Description
distcomp is a collection of methods to fit models to data that may be
distributed at various sites. The package arose as a way of addressing the
issues regarding data aggregation; by allowing sites to have control over
local data and transmitting only summaries, some privacy controls can be
maintained. Even when participants have no objections in principle to data
aggregation, it may still be useful to keep data local and expose just the
computations. For further details, please see the reference cited below.
Details
The initial implementation consists of a stratified Cox model fit with distributed survival data and a Singular Value Decomposition of a distributed matrix. General Linear Models will soon be added. Although some sanity checks and balances are present, many more are needed to make this truly robust. We also hope that other methods will be added by users.
We make the following assumptions in the implementation:
(a) the aggregate data is logically a stacking of data at each site, i.e.,
the full data is row-partitioned into sites where the rows are observations;
(b) Each site has the package distcomp installed and a workspace setup
for (writeable) use by the opencpu server
(see distcompSetup(); and (c) each site is exposing distcomp
via an opencpu server.
The main computation happens via a master process, a script of R code,
that makes calls to distcomp functions at worker sites via opencpu.
The use of opencpu allows developers to prototype their distributed implementations
on a local machine using the opencpu package that runs such a server locally
using localhost ports.
Note that distcomp computations are not intended for speed/efficiency;
indeed, they are orders of magnitude slower. However, the models that are fit are
not meant to be recomputed often. These and other details are discussed in the
paper mentioned above.
The current implementation, particularly the Stratified Cox Model, makes direct use of
code from survival::coxph(). That is, the underlying Cox model code is
derived from that in the R survival survival package.
For an understanding of how this package is meant to be used, please see the documented examples and the reference.
References
Software for Distributed Computation on Medical Databases: A Demonstration Project. Journal of Statistical Software, 77(13), 1-22. doi:10.18637/jss.v077.i13
Appendix E of Modeling Survival Data: Extending the Cox Model by Terry M. Therneau and Patricia Grambsch. Springer Verlag, 2000.
See Also
The examples in system.file("doc", "examples.html", package="distcomp")
The source for the examples: system.file("doc_src", "examples.Rmd", package="distcomp").
Setup a workspace and configuration for a distributed computation
Description
The function distcompSetup sets up a distributed computation
and configures some global parameters such as definition file names,
data file names, instance object file names, and ssl configuration parameters. The
function creates some of necessary subdirectories if not already present and throws
an error if the workspace areas are not writeable
Usage
distcompSetup(
workspacePath = "",
defnPath = paste(workspacePath, "defn", sep = .Platform$file.sep),
instancePath = paste(workspacePath, "instances", sep = .Platform$file.sep),
defnFileName = "defn.rds",
dataFileName = "data.rds",
instanceFileName = "instance.rds",
resultsCacheFileName = "results_cache.rds",
ssl_verifyhost = 1L,
ssl_verifypeer = 1L
)
Arguments
workspacePath |
a folder specifying the workspace path. This has to be writable by the opencpu process. On a cloud opencpu server on Ubuntu, for example, this requires a one-time modification of apparmor profiles to enable write permissions to this path |
defnPath |
the path where definition files will reside, organized by computation identifiers |
instancePath |
the path where instance objects will reside |
defnFileName |
the name for the compdef definition files |
dataFileName |
the name for the data files |
instanceFileName |
the name for the instance files |
resultsCacheFileName |
the name for the instance results cache files for HE computations |
ssl_verifyhost |
integer value, usually |
ssl_verifypeer |
integer value, usually |
Value
TRUE if all is well
See Also
Examples
## Not run:
distcompSetup(workspacePath="./workspace")
## End(Not run)
Given the id of a serialized object, invoke a method on the object with arguments using homomorphic encryption
Description
The function executeHEMethod is a homomorphic
encryption wrapper around executeMethod. It ensures any
returned result is encrypted using the homomorphic encryption
function.
Usage
executeHEMethod(objectId, method, ...)
Arguments
objectId |
the (instance) identifier of the object on which to invoke a method |
method |
the name of the method to invoke |
... |
further arguments as appropriate for the method |
Value
a list containing an integer and a fractional result converted to characters
Given the id of a serialized object, invoke a method on the object with arguments
Description
The function executeMethod is really the heart of
distcomp. It executes an arbitrary method on an object that
has been serialized to the distcomp workspace with any
specified arguments. The result, which is dependent on the
computation that is executed, is returned. If the object needs
to save state between iterations on it, it is automatically
serialized back for the ensuing iterations
Usage
executeMethod(objectId, method, ...)
Arguments
objectId |
the (instance) identifier of the object on which to invoke a method |
method |
the name of the method to invoke |
... |
further arguments as appropriate for the method |
Value
a result that depends on the computation being executed
Generate an identifier for an object
Description
A hash is generated based on the contents of the object
Usage
generateId(object, algo = "xxhash64")
Arguments
object |
the object for which a hash is desired |
algo |
the algorithm to use, default is "xxhash64" from
|
Value
the hash as a string
See Also
Get the value of a variable from the global store
Description
In distcomp, several web applications need to communicate between themselves. Since only one application is expected to be active at any time, they do so via a global store, essentially a hash table. This function retrieves the value of a name
Usage
getComputationInfo(name)
Arguments
name |
the name for the object |
Value
the value for the variable, NULL if not set
See Also
Return the workspace and configuration setup values
Description
The function getConfig returns the values of the
configuration parameters set up by distcompSetup
Usage
getConfig(...)
Arguments
... |
any further arguments |
Value
a list consisting of
workspacePath |
a folder specifying the workspace path. This has to be writable by the opencpu process. On a cloud opencpu server on Ubuntu, for example, this requires a one-time modification of apparmor profiles to enable write permissions to this path |
defnPath |
the path where definition files will reside, organized by computation identifiers |
instancePath |
the path where instance objects will reside |
defnFileName |
the name for the compdef definition files |
dataFileName |
the name for the data files |
instanceFileName |
the name for the instance files |
ssl_verifyhost |
integer value, usually |
ssl_verifypeer |
integer value, usually |
See Also
Examples
## Not run:
getConfig()
## End(Not run)
Make a computation definition given the computation type
Description
The function makeDefinition returns a computational
definition based on current inputs (from the global store) given a
canonical computation type tag. This is a utility function for web
applications to use as input is being gathered
Usage
makeDefinition(compType)
Arguments
compType |
the canonical computation type tag |
Value
a data frame corresponding to the computation type
See Also
Examples
## Not run:
makeDefinition(names(availableComputations())[1])
## End(Not run)
Instantiate a master process for HE operations
Description
Instantiate a master process for HE operations
Usage
makeHEMaster(defn)
Arguments
defn |
the computation definition |
Value
an master object for HE operations
Make a master object given a definition
Description
The function makeMaster returns a master object
corresponding to the definition. The types of master objects
that can be created depend upon the available computations
Usage
makeMaster(defn, partyNumber = NULL, debug = FALSE)
Arguments
defn |
the computation definition |
partyNumber |
the number of the noncooperating party, which can be optionally set if HE is desired |
debug |
a debug flag |
Value
a master object of the appropriate class based on the definition
See Also
Instantiate an noncooperating party
Description
Instantiate an noncooperating party
Usage
makeNCP(
ncp_defn,
comp_defn,
sites = list(),
pubkey_bits = NULL,
pubkey_n = NULL,
den_bits = NULL
)
Arguments
ncp_defn |
the NCP definition |
comp_defn |
the computation definition |
sites |
a list of sites each entry a named list of name, url, worker |
pubkey_bits |
number of bits for public key |
pubkey_n |
the n for the public key |
den_bits |
the log to base 2 of the denominator |
Value
an NCP object
Make a worker object given a definition and data
Description
The function makeWorker returns an object of the
appropriate type based on a computation definition and sets the
data for the object. The types of objects that can be created
depend upon the available computations
Usage
makeWorker(defn, data, pubkey_bits = NULL, pubkey_n = NULL, den_bits = NULL)
Arguments
defn |
the computation definition |
data |
the data for the computation |
pubkey_bits |
the number of bits for the public key (used only
if |
pubkey_n |
the |
den_bits |
the number of bits for the denominator (used only
if |
Value
a worker object of the appropriate class based on the definition
See Also
Clear the contents of the global store
Description
In distcomp, several web applications need to communicate between themselves. Since only one application is expected to be active at any time, they do so via a global store, essentially a hash table. This function clears the store, except for the working directory.
Usage
resetComputationInfo()
Value
an empty list
See Also
setComputationInfo(), getComputationInfo()
Run a specified distcomp web application
Description
Web applications can define computation, setup worker sites or masters. This function invokes the appropriate web application depending on the task
Usage
runDistcompApp(appType = c("definition", "setupWorker", "setupMaster"))
Arguments
appType |
one of three values: "definition", "setupWorker", "setupMaster" |
Value
the results of running the web application
See Also
defineNewComputation(), setupWorker(), setupMaster()
Save a computation instance, given the computation definition, associated data and possibly a data file name to use
Description
The function saveNewComputation uses the computation definition to save
a new computation instance. This is typically done for every site that wants to participate
in a computation with its own local data. The function examines the computation definition
and uses the identifier therein to uniquely refer to the computation instance at the site.
This function is invoked (maybe remotely) on the opencpu server by
uploadNewComputation() when a worker site is being set up
Usage
saveNewComputation(defn, data, dataFileName = NULL)
Arguments
defn |
an already defined computation |
data |
the (local) data to use |
dataFileName |
a file name to use for saving the data. Typically |
Value
TRUE if everything goes well
See Also
Save an NCP instance, given the sites as associated data and possibly a data file name to use
Description
The function saveNewNCP uses the list of sites
definition to save a new NCP instance. This is
typically done for every pair of NCPs used in a computation. The function examines the
computation definition and uses the identifier therein to
uniquely refer to the computation instance at the site. This
function is invoked (maybe remotely) on the opencpu server by
uploadNewComputation() when a worker site is being set up
Usage
saveNewNCP(defn, comp_defn, data, dataFileName = NULL)
Arguments
defn |
a definition of the ncp |
comp_defn |
the computation definition |
data |
the list of sites with name and url to use |
dataFileName |
a file name to use for saving the
data. Typically |
Value
TRUE if everything goes well
See Also
Set a name to a value in a global variable
Description
In distcomp, several web applications need to communicate between themselves. Since only one application is expected to be active at any time, they do so via a global store, essentially a hash table. This function sets a name to a value
Usage
setComputationInfo(name, value)
Arguments
name |
the name for the object |
value |
the value for the object |
Value
invisibly returns the all the name value pairs
See Also
Setup a computation master
Description
This function just calls runDistcompApp() with the
parameter "setupMaster"
Usage
setupMaster()
Value
the results of running the web application
See Also
Setup a worker site
Description
This function just calls runDistcompApp() with the
parameter "setupWorker"
Usage
setupWorker()
Value
the results of running the web application
See Also
Upload a new computation and data to an opencpu server
Description
The function uploadNewComputation is really a remote version
of saveNewComputation(), invoking that function on an opencpu server.
This is typically done for every site that wants to participate in a computation
with its own local data. Note that a site is always a list of at least a unique
name element (distinguishing the site from others) and a url element.
Usage
uploadNewComputation(site, defn, data)
Arguments
site |
a list of two items, a unique |
defn |
the identifier of an already defined computation |
data |
the (local) data to use |
Value
TRUE if everything goes well
See Also
Upload a new Non-Cooperating Party (NCP) information and sites to an opencpu server
Description
The function uploadNewNCP is really a remote version
of saveNewNCP(), invoking that function on an opencpu server.
This is typically done for the two NCPs participating in a
computation with the list of sites. Note that sites are always
a list of at least a unique name element (distinguishing the
site from others) and a url element.
Usage
uploadNewNCP(defn, comp_defn, url = NULL, worker = NULL, sites)
Arguments
defn |
a definition for the NCP |
comp_defn |
the computation definition |
url |
the url for the NCP. Only one of url and worker can be non-null |
worker |
the worker for the NCP if local. Only one of url and worker can be non-null |
sites |
a list of lists, each containing two items, a unique
|
Value
TRUE if everything goes well
See Also
Write the code necessary to run a master process
Description
Once a computation is defined, worker sites are set up, the master process code is written by this function. The current implementation does not allow one to mix localhost URLs with non-localhost URLs
Usage
writeCode(defn, sites, outputFilenamePrefix)
Arguments
defn |
the computation definition |
sites |
a named list of site URLs participating in the computation |
outputFilenamePrefix |
the name of the output file prefix using which code and data will be written |
Value
the value TRUE if all goes well