Analyzing Trees With Offsets

2022-11-03

One of the major reasons for the utilization of point age estimates instead of accounting for fossil age uncertainty in tip-dating analyses of entirely extinct lineages has been the limitation by most Bayesian inference software on handling paleontological datasets in clock analyses. All tree models available for clock analyses assume a diversification process that start at some point in the past (either the time of origin or the time of the most recent common ancestor of all sampled taxa) and stop at the present time (t = 0 yr). In such cases, the birth-death process stops at t = 0 before the present, and the height of the youngest nodes in the tree are also = 0. In the special case where only fossils are included, the birth-death process will stop at some time t > 0yr before the present, at the age of the youngest fossil (e.g., age = 3 Ma). However, the height of the youngest node will still be treated as = 0, which is the default tree format by most phylogenetic inference software. This difference creates the necessity of rescaling the resulting posterior fossil trees with an offset equal to the difference between the age youngest node and the present time (in the example above, offset = 3 myr).

This latter process is relatively simple if we attribute a fixed point age value to all fossils in the tree (so the age of the youngest fossil tip will be known a priori). However, using fixed point age value to all fossils creates important estimate biases, and age uncertainty at the tips should be accounted by allowing them to be estimated within a time range, as discuss in Barido‐Sottani et al. (2020). This presents issues when analyzing fully extinct clades, because all tip ages are variable and are allowed to be estimated within a time range, and each posterior tree will likely have a distinct age for the youngest taxon and thus no common fixed point in the past to establish the rescaling among all posterior trees. To address this issue, a few strategies are available:

Analyzing Trees With Offset ages (BEAST2)

The SA package for BEAST2 includes an alternate tree with offset format, which can be set up as shown in the FBD tutorial. This uses a a “treeWoffset” operator that will sample the offset of every tree from the posterior as an additional parameter during clock analysis. However, posterior trees produced by BEAST2 with offset ages cannot be directly used by TreeAnnotator to calculate the maximum clade credibility tree (MCC), as the latter still assumes that the youngest tips in the dataset represent extant tips. To correct for this, a “dummy” extant taxon must be added to the posterior trees to be interpreted by TreeAnnotator, and the “dummy” extant must be pruned from the resulting MCC tree to produce the final MCC tree. These can be done by performing by following steps and functions from EvoPhylo:

Load the EvoPhylo package

library(EvoPhylo)

1. Import the posterior tree distribution.

First we import the posterior tree distribution and log file produced by BEAST2. The log file is needed because it contains the offset for each sample. In order to store the correct age in the tree, we are going to add a “dummy” tip to each phylogeny. This dummy tip will have age 0 for each sample, and so the length of the age between this tip and the root of the tree will encode the age of the tree including the offset.

trees_file <- system.file("extdata", "ex_offset.trees", package = "EvoPhylo")
log_file <- system.file("extdata", "ex_offset.log", package = "EvoPhylo")

## Transform the trees to add a dummy tip
converted_trees <- offset.to.dummy.metadata(trees_file, log_file)
## [1] 364.0323

Here we output the transformed trees to a variable, but in practice it is often more useful to write them directly to a new trees file.

## Transform the trees to add a dummy tip and save to file
converted_trees <- offset.to.dummy.metadata(trees_file, log_file, 
                                            output.file = "transformed_offset.trees")

2. Summarize the modified tree distribution

Using the transformed trees, we can then perform our post-processing as usual, for instance using TreeAnnotator to produce an MCC tree.

3. Restore the final summary tree

Our summary tree will include the dummy tip that we added earlier, so the last step of the process is to remove it, and get back the proper extinct tree with an offset.

## Find the example MCC tree from the package
tree_file <- system.file("extdata", "ex_offset.MCC.tre", package = "EvoPhylo")
## Then remove the dummy tip
final_tree <- drop.dummy.beast(tree_file)
## The output contains the final tree and the offset
final_tree$offset
## [1] 366.4744
final_tree$tree
## 'treedata' S4 object that stored information
## of
##  'C:/Users/tiago/AppData/Local/Temp/RtmpMVYSgW/Rbuild74502b2b1bb8/EvoPhylo/inst/extdata/ex_offset.MCC.tre'.
## 
## ...@ phylo:
## 
## Phylogenetic tree with 19 tips and 18 internal nodes.
## 
## Tip labels:
##   Floweria_anomala, Floweria_arctostriata, Floweria_becraftensis,
## Floweria_chemungensis1, Floweria_chemungensis2, Floweria_chemungensis3, ...
## 
## Rooted; includes branch lengths.
## 
## with the following features available:
##   'height', 'height_0.95_HPD', 'height_median', 'height_range', 'length',
## 'length_0.95_HPD', 'length_median', 'length_range', 'posterior', 'rate',
## 'rate_0.95_HPD', 'rate_median', 'rate_range'.
## 
## # The associated data tibble abstraction: 38 x 16
## # The 'node', 'label' and 'isTip' are from the phylo tree.
##     node label    isTip height height_0.95_HPD height_median height_range length
##    <int> <chr>    <lgl> <name> <list>          <named list>  <list>        <dbl>
##  1     1 Floweri~ TRUE  <dbl>  <dbl [2]>       <dbl [1]>     <dbl [2]>     2.44 
##  2     2 Floweri~ TRUE  <dbl>  <dbl [2]>       <dbl [1]>     <dbl [2]>     3.50 
##  3     3 Floweri~ TRUE  <dbl>  <dbl [2]>       <dbl [1]>     <dbl [2]>     2.65 
##  4     4 Floweri~ TRUE  <dbl>  <dbl [2]>       <dbl [1]>     <dbl [2]>     0.804
##  5     5 Floweri~ TRUE  <dbl>  <dbl [2]>       <dbl [1]>     <dbl [2]>     2.46 
##  6     6 Floweri~ TRUE  <dbl>  <dbl [2]>       <dbl [1]>     <dbl [2]>    11.4  
##  7     7 Floweri~ TRUE  <dbl>  <dbl [2]>       <dbl [1]>     <dbl [2]>     0.360
##  8     8 Floweri~ TRUE  <dbl>  <dbl [2]>       <dbl [1]>     <dbl [2]>    11.6  
##  9     9 Floweri~ TRUE  <dbl>  <dbl [2]>       <dbl [1]>     <dbl [2]>     0.597
## 10    10 Floweri~ TRUE  <dbl>  <dbl [2]>       <dbl [1]>     <dbl [2]>     1.70 
## # ... with 28 more rows, and 8 more variables: length_0.95_HPD <list>,
## #   length_median <dbl>, length_range <list>, posterior <dbl>, rate <dbl>,
## #   rate_0.95_HPD <list>, rate_median <dbl>, rate_range <list>

As before, we can also save the tree directly to a file.

## Remove the dummy tip and save to file
final_tree <- drop.dummy.beast(tree_file, output.file = "ex_final_mcc.tre")

Analyzing Trees With Offset ages (Mr.Bayes)

Mr.Bayes does not have an equivalent procedure to treeWoffset from BEAST2 for handling tip-dating analyses of entirely extinct lineages. In this case, it is necessary to manually add a “dummy” extant taxon to the dataset from the beginning of the analyses (as done in Simões and Pierce (2021)). This “dummy” tip must be scored only with missing data (“?”) and constrained as the sister taxon to one other taxon in the data, so it does not fluctuate around the tree as a wildcard. As with the MCC trees from BEAST2, the final summary trees from Mr.Bayes [50% majority rule consensus tree (MRC) and the maximum compatible tree (MCT)] will have the extant “dummy’ taxon in then. It can be pruned using drop.dummy.mb:

## Find the example MCC tree from the package
tree_file_mb <- system.file("extdata", "tree_mb_dummy.tre", package = "EvoPhylo")

## Remove the dummy tip and save to file
final_tree_mb <- drop.dummy.mb(tree_file_mb, output.file = "tree_mb_final.tre")

References

Barido‐Sottani, Joëlle, Nina M. A. van Tiel, Melanie J. Hopkins, David F. Wright, Tanja Stadler, and Rachel C. M. Warnock. 2020. Ignoring Fossil Age Uncertainty Leads to Inaccurate Topology and Divergence Time Estimates in Time Calibrated Tree Inference.” Frontiers in Ecology and Evolution 8 (June): 183. https://doi.org/10.3389/fevo.2020.00183.
Simões, Tiago R., and Stephanie E. Pierce. 2021. “Sustained High Rates of Morphological Evolution During the Rise of Tetrapods.” Nature Ecology & Evolution 5 (10): 1403–14. https://doi.org/10.1038/s41559-021-01532-x.