Tracing the evolutionary histories of ultra-rare variants using variational dating of large ancestral recombination graphs
🎓 Authors: Nathaniel S. Pope et al.
🔗 Link: preprint
🧐 What did the authors do?
The authors introduced a new scalable method for dating nodes and mutations in Ancestral Recombination Graphs (ARGs). By leveraging the entire genealogical history, they can estimate the age of ultra-rare variants with higher precision than previous approaches. This method is released as a new version of tsdate (0.2).
⚙️ How did they do it?
This work requires two major breakthroughs:
1. Inferring ARGs over large genomic regions.
Recombination events are the most important source of information for dating rare variants. However, leveraging this information requires building ARGs over large contiguous genomic regions. This becomes even more challenging when working with sequencing data, which are denser than array data used in previous methods. To solve this issue, the authors released tsinfer 0.4, which now allows the inference to be distributed across multiple nodes, while guaranteeing the same results as if it were inferred on a single core.
2. Developing a new algorithm for dating the variants.
Calculating the exact posterior distribution of the age of a variant is computationally intractable, as it requires integrating over all possible ages of nodes in the ARGs (assuming the topology has been correctly inferred in the previous step). The authors introduced a variational approach to this challenge (i.e. assuming the posterior takes a specific form, that is then optimized to be as close as possible to the true posterior). The variational posterior was optimized using the "belief propagation" algorithm.
🚀 Why does it matter?
This new method allows using allelic age as an alternative to allele frequency when studying rare variants. The latter approach is indeed subject to sampling imbalance and has noisy estimates. This new method introduces the possibility of using a more robust statistic for downstream analyses.