Running bhmbasket on HPC

Stephan Wojciekowski

February 14, 2022

This vignette provides a short example on how to use the R package bhmbasket in a high performance computing (HPC) environment using the R packages doFuture and future.batchtools.

library(bhmbasket)
library(doFuture)
library(future.batchtools)

rng_seed <- 5440
set.seed(rng_seed)

Setup of the parallel backend

The code below provides an example for specifying a parallel backend using the future framework with the SLURM job scheduler. Kindly see the documentation of the R packages doFuture and future.batchtools for further options. This code is to be run on the master node.

## Adapt the SLURM template to requirements
job_time  <- 1   # time for job in hours
n_workers <- 24  # number of worker nodes
n_cpus    <- 16  # number of cpus per worker node
gb_memory <- 2   # memory [GB] per cpu

slurm <- tweak(batchtools_slurm,
           template  = system.file('templates/slurm-simple.tmpl',
                                   package = 'batchtools'),
           workers   = n_workers,
           resources = list(
             walltime  = 60 * 60 * job_time,
             ncpus     = n_cpus,
             memory    = 1000 * gb_memory))

## Register the parallel backend 
registerDoFuture()

## Specify how the futures should be resolved
plan(list(slurm, multisession))

Running some bhmbasket code on HPC

The R package bhmbasket makes use of the foreach framework and runs with every applicable parallel backend. With a parallel backend registered as shown above and running the code on the master node, the job scheduler will automatically distribute the jobs to the worker nodes via plan(slurm), and with the nested parallelization built into performAnalyses(), each worker node makes use of its CPUs via plan(multisession).

Below is some example code, which was taken from the examples section of ?bhmbasket::getEstimates. Due to the foreach framework, no adjustments to the code are necessary. Kindly note that running this small example on a HPC environment will most likely not result in a performance improvement.

scenarios_list <- simulateScenarios(
    n_subjects_list     = list(c(10, 20, 30)),
    response_rates_list = list(c(0.1, 0.2, 3)),
    n_trials            = 10)

  analyses_list <- performAnalyses(
    scenario_list       = scenarios_list,
    target_rates        = c(0.1, 0.1, 0.1),
    calc_differences    = matrix(c(3, 2, 2, 1), ncol = 2),
    n_mcmc_iterations   = 100)

  getEstimates(analyses_list)