Simulation study from empirical data with penetrance

BayesMendel Lab

Goal

Here we apply the penetrance package to simulated families where the data-generating penetrance function is known and based on existing penetrance estimates.

Simulated Data

The data-generating distribution of the age-specific penetrances is based on existing penetrance estimates for Colorectal cancer in carriers of any pathogenic variant in MLH1 from the PanelPRO Database.

The families were simulated using the PedUtils Rpackage.

dat <- test_fam2

Simple simulation

Then we run the estimation using the default settings.

# Set the random seed
set.seed(2024)

# Set the prior
prior_params <- list(
    asymptote = list(g1 = 1, g2 = 1),
    threshold = list(min = 5, max = 30),
    median = list(m1 = 2, m2 = 2),
    first_quartile = list(q1 = 6, q2 = 3)
)

# Set the allele frequency for MLH1 based on PanelPRO Database
prevMLH1 <- 0.0004453125 

# We use the default baseline (non-carrier) penetrance
print(baseline_data_default)

# We run the estimation procedure with one chain and 20k iterations  
out_sim <- penetrance(
    pedigree  = dat, twins = NULL, n_chains = 1, n_iter_per_chain = 20000, 
    ncores = 2, baseline_data = baseline_data_default , prev = prevMLH1, 
    prior_params = prior_params, burn_in = 0.1, median_max = TRUE,  
    ageImputation = FALSE, removeProband = FALSE
)

References

Lee G, Liang JW, Zhang Q, Huang T, Choirat C, Parmigiani G, Braun D. Multi-syndrome, multi-gene risk modeling for individuals with a family history of cancer with the novel R package PanelPRO. Elife. 2021;10:e68699. doi:10.7554/eLife.6869