Mixtures

Mikkel Meyer Andersen

29 juni, 2022

Introduction

This vignette shows how to use the R package disclapmix that implements the method described in (Andersen, Eriksen, and Morling 2013b) and (Andersen et al. 2015). For a more gentle introduction to the method, refer to the introduction vignette and (Andersen, Eriksen, and Morling 2013a).

Mixture analysis

We again use the Danish reference database (Hallenberg et al. 2005) with \(n = 185\) observations (male Y-STR haplotypes) at \(r=10\) loci is available in the danes dataset. Let us load the package as well as the data:

library(disclapmix)
data(danes)

The database is in compact format, i.e. one unique haplotype per row. To fit the model, we need one observation per row. This is done for example like this:

db <- as.matrix(danes[rep(seq_len(nrow(danes)), danes$n), seq_len(ncol(danes)-1)])
str(db)
##  int [1:185, 1:10] 13 13 13 13 13 13 14 14 14 14 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:185] "1" "2" "3" "4" ...
##   ..$ : chr [1:10] "DYS19" "DYS389I" "DYS389II" "DYS390" ...

Also, note that the database is now an integer matrix.

Assume now that we have a mixture and that the reference database are without these two contributors:

donor1 <- db[1, ]
donor2 <- db[20, ]
refdb <- db[-c(1, 20), ]

We now construct the mixture:

mix <- generate_mixture(list(donor1, donor2))

We can then see some properties of possible pairs:

pairs <- contributor_pairs(mix)
pairs
## Mixture:
##   DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## 1 13,14   13,14    29,30     22     10  13,15     13  14,15  11,12     12
## 
## Number of possible contributor pairs = 32

To do much more, we need a model assigning hpalotype probabilies. In the introduction vignette, we found that 4 clusters seemed fine, so let us fit this model:

fit <- disclapmix(x = refdb, clusters = 4L)

We can now use this model to e.g. rank the contributor pairs:

ranked_pairs <- rank_contributor_pairs(pairs, fit)
ranked_pairs
## Mixture:
##   DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## 1 13,14   13,14    29,30     22     10  13,15     13  14,15  11,12     12
## 
## Contributor pairs = 32
## 
## Sum of all (product of contributor pair haplotypes) = 2.263911e-14
## 
## Showing rank 1-5:
## 
## Rank 1 [ P(H1)*P(H2) = 2.929608e-15 ]:
##    DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## H1    13      14       30      .      .     15      .     14     11      .
## H2    14      13       29      .      .     13      .     15     12      .
##            Prob
## H1 1.945729e-11
## H2 1.505661e-04
## 
## Rank 2 [ P(H1)*P(H2) = 2.224515e-15 ]:
##    DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## H1    13      13       29      .      .     13      .     15     12      .
## H2    14      14       30      .      .     15      .     14     11      .
##            Prob
## H1 2.216703e-05
## H2 1.003524e-10
## 
## Rank 3 [ P(H1)*P(H2) = 1.722751e-15 ]:
##    DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## H1    13      14       30      .      .     13      .     14     11      .
## H2    14      13       29      .      .     15      .     15     12      .
##            Prob
## H1 3.356602e-09
## H2 5.132425e-07
## 
## Rank 4 [ P(H1)*P(H2) = 1.364569e-15 ]:
##    DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## H1    13      13       29      .      .     15      .     15     12      .
## H2    14      14       30      .      .     13      .     14     11      .
##            Prob
## H1 7.556192e-08
## H2 1.805895e-08
## 
## Rank 5 [ P(H1)*P(H2) = 1.060766e-15 ]:
##    DYS19 DYS389I DYS389II DYS390 DYS391 DYS392 DYS393 DYS437 DYS438 DYS439
## H1    13      13       30      .      .     15      .     14     11      .
## H2    14      14       29      .      .     13      .     15     12      .
##            Prob
## H1 6.777397e-11
## H2 1.565152e-05
## 
##  (27 contributor pairs hidden.)

We can get the ranks for the donors:

get_rank(ranked_pairs, donor1)
## [1] 13
get_rank(ranked_pairs, donor2)
## [1] 13

References

Andersen, Mikkel Meyer, Poul Svante Eriksen, Helle Smidt Mogensen, and Niels Morling. 2015. Identifying the most likely contributors to a Y-STR mixture using the discrete Laplace method.” Forensic Science International: Genetics 15: 76–83. https://doi.org/10.1016/j.fsigen.2014.09.011.
Andersen, Mikkel Meyer, Poul Svante Eriksen, and Niels Morling. 2013a. A gentle introduction to the discrete Laplace method for estimating Y-STR haplotype frequencies.” Preprint, arXiv:1304.2129.
———. 2013b. The discrete Laplace exponential family and estimation of Y-STR haplotype frequencies.” Journal of Theoretical Biology 329: 39–51. https://doi.org/10.1016/j.jtbi.2013.03.009.
Hallenberg, Charlotte, Karsten Nielsen, Bo Simonsen, Juan Sanchez, and Niels Morling. 2005. “Y-Chromosome STR Haplotypes in Danes.” Forensic Science International 155 2-3: 205–10.