ACEsimFit

Overview

This package is designed for 1) Simulating kin pairs data based on the assumption that every trait is affected by genetic effects (A), common environmental effects (C) and unique environmental effects (E). 2) Using kin pairs data to fit an ACE model and get model fit output. In our vignette, we will use height as a trait to demonstrate how people can simulate kin pairs of height data and how to use those simulated data to fit a univariate ACE model. We will also discuss the possible ways to analyse these model fitting results. 3) Calculate power of heritability estimate given a specific condition. (Note: The ideas and data in this vignette are only for demonstration purpose and can be inaccurate in terms of the true facts in science.)

Simulate kin pair data

Suppose we have a situation like this: I wants to investigate what factors impact the height of the adults in a non-European country. Researchers in developed countries like US and UK have collected a sea of twin data to explore the question and reached a conclusion: 60% of variance of height comes from genes, 20% of variance comes from family environment and 20% comes from personal environment and error. I wanted to see if this conclusion holds true in an Asian country. One approach is to use a public dataset with height measure and family structure records.

However, the dataset has two problems: 1) Only have 180 pairs of twins 2) can only distinguish MZ twins and same-sex DZ twins. Given the problems this dataset had, it will be useful to do a priori power analysis to check the statistical power of my study.

So first I need to simulate data which they have same variance and covariance structure as the data in the public dataset. We can use the kinsim_double function to do that. Here’s some conditions I want to replicate:

Sample sizes: 60 pairs of opposite-sex DZ twins (genetic relatedness is .5) and 120 pairs of same-sex MZ and DZ twins (genetic relatedness is .75)
The relatedness of .75 is achieved by combining MZ and DZ twins instead of simulating kin pairs with relatedness of .75 directly.
ACE variance structure: A = .6, C = .2, E = .2
Group mean is both 0, because they should be Z scores.

Therefore, we set the parameters of the kinsim_double function like below:

kindata <- kinsim_double(
    GroupNames = c("SStwins", "OStwins"),
    GroupSizes = c(120, 60),
    GroupRel = c(.75, 0.5),
    GroupR_c = c(1, 1),
    mu = c(0, 0),
    ace1 = c(.6, .2, .2),
    ace2 = c(.6, .2, .2),
    ifComb = TRUE
)
head(kindata)
#>     GroupName    R r_c id         A1         A2          C1          C2
#> 86    SStwins 0.75   1  1 -0.3704049  0.3895268  0.40500808  0.40500808
#> 33    SStwins 0.75   1  2 -0.7555032 -0.7555032 -0.08405098 -0.08405098
#> 37    SStwins 0.75   1  3 -2.0094436 -2.0094436  0.31722837  0.31722837
#> 14    SStwins 0.75   1  4  0.6867593  0.6867593 -0.06087827 -0.06087827
#> 114   SStwins 0.75   1  5  0.2667831  0.8442427 -0.18901303 -0.18901303
#> 119   SStwins 0.75   1  6  0.6671442  0.9857763 -0.24945574 -0.24945574
#>               E1         E2         y1          y2
#> 86   0.357409697  0.2437914  0.3920129  1.03832630
#> 33  -0.006418753 -0.3592010 -0.8459729 -1.19875513
#> 37   0.231995747 -0.4021750 -1.4602195 -2.09439026
#> 14   0.516862459 -0.2809559  1.1427435  0.34492513
#> 114 -0.518060418 -0.4069572 -0.4402904  0.24827242
#> 119  0.593802689 -0.6371508  1.0114912  0.09916975

Now you can see we have a data.frame with 180 pairs of simulated twins. And we also have their variance components and other information.

Calculate the power of the heritability estimate given the variance structure

Generally, calculating power for the A estimate had two approaches: likelihood theory and the least squares theory.

Calculate power based on likelihood theory

To use the likelihood theory, we need to simulate a number of datasets with the same variance structure and average the suggested power from each set of data. See a more detailed explanation at link. So we need to have a set of model fitting results with the -2ll values for the ACE model and the CE model.

Luckily, in our package they have a function Sim_Fit to simulate kin pairs data, fit them automatically into a ACE model and return the model summary results. The function fit the ACE model with the help of OpenMx package.

Here, we again assigned the same parameters to the function. There are a few new parameters here:

We want 50 simulated datasets so my power calculation can be relatively accurate. So we set the nIter = 50. Set more iterations when you want to get a even more accurate power.
We hope my simulated results can be replicated so we set SSeed = 62. Pick a lucky number here!
We don’t want to contrain the estimation so we set lbound = FALSE.
We don’t need the raw data since they will eat up my poor-RAM laptop. So we set saveRaw = FALSE.

time1 <- Sys.time()
results_fit <- Sim_Fit(
  GroupNames = c("SStwins", "OStwins"),
  GroupSizes = c(120, 60),
  nIter = 50,
  SSeed = 62,
  GroupRel = c(.75, 0.5),
  GroupR_c = c(1, 1),
  mu = c(0, 0),
  ace1 = c(.6, .2, .2),
  ace2 = c(.6, .2, .2),
  ifComb = TRUE,
  lbound = FALSE,
  saveRaw = FALSE
)
time2 <- Sys.time()
## FYI, the time used for the results above is here. So design your simulation wisely!!!
time2 - time1
#> Time difference of 29.73282 secs

Here’s one example of the nested comparison table from the results

results_fit[["Iteration1"]][["Results"]][["nest"]]
#>            base comparison ep minus2LL  df      AIC     diffLL diffdf
#> 1 oneACEvc_1cov       <NA>  4 896.1042 356 904.1042         NA     NA
#> 2 oneACEvc_1cov    oneAEvc  3 897.0780 357 903.0780  0.9737667      1
#> 3 oneACEvc_1cov    oneCEvc  3 898.0232 357 904.0232  1.9190034      1
#> 4 oneACEvc_1cov     oneEvc  2 980.2935 358 984.2935 84.1892959      2
#>              p
#> 1           NA
#> 2 3.237426e-01
#> 3 1.659666e-01
#> 4 5.230301e-19

We can then calculated the weighted ncp with the average difference of log-likelihood for ACE and CE models.In turn, I calculated the power for the given variance structure and relatedness:

N <- 180 ##the total number of kin pairs you used in your previous simulation
## Calculate the average diffLL between ACE and CE model.
DiffLL <- numeric()
for(i in 1:50){
  DiffLL[i] <- results_fit[[1]][["Results"]][["nest"]]$diffLL[3] 
}
meanDiffLL <- mean(DiffLL)
## Calculate the power based on an alpha level of .05
Power <- 1- pchisq(qchisq(1-.05, 1), 1, meanDiffLL)
Power
#> [1] 0.2831639

So you can see the power for my model would be insufficient to have a confident estimate of height heritability. :(

Calculate power based on least squares theory

Unlike the likelihood theory, we don’t need to run simulations to calculate power because we can have a formula in Xuanyu and Mason’s paper (submitted; you will see a link here soon). The function Power_LS is designed for power calculation based on LS theory.

Here I typed the parameters my research question involved into the functions.

Power_LS(N1=120, N2=60, h2=.6, c2=.2, R1 = .75, R2 = 0.5, alpha = 0.05)
#> [1] 0.3881047

Your can see the power calculated with LS theory is different from the one with likelihood theory. This is due to 1) with the LS formula, we did’t consider the combination of MZ and DZ twins to reach a relatedness of .75 2) they will be different especially when the sample size is small. In our case, if we have 1800 pairs in stead of 180, the power from likelihood theory will be 0.9890003 and the power from least squares theory will be 0.9960664. (Much more similar at this time!)