ARTree: A Deep Autoregressive Model for Phylogenetic Inference (2310.09553v1)
Abstract: Designing flexible probabilistic models over tree topologies is important for developing efficient phylogenetic inference methods. To do that, previous works often leverage the similarity of tree topologies via hand-engineered heuristic features which would require pre-sampled tree topologies and may suffer from limited approximation capability. In this paper, we propose a deep autoregressive model for phylogenetic inference based on graph neural networks (GNNs), called ARTree. By decomposing a tree topology into a sequence of leaf node addition operations and modeling the involved conditional distributions based on learnable topological features via GNNs, ARTree can provide a rich family of distributions over the entire tree topology space that have simple sampling algorithms and density estimation procedures, without using heuristic features. We demonstrate the effectiveness and efficiency of our method on a benchmark of challenging real data tree topology density estimation and variational Bayesian phylogenetic inference problems.
- Phylogenetic and phylodynamic approaches to understanding and combating the early SARS-CoV-2 pandemic. Nature Reviews. Genetics, 23:547 – 562, 2022.
- MolGAN: An implicit generative model for small molecular graphs, 2018. URL https://arxiv.org/abs/1805.11973.
- Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
- Fast and accurate deep network learning by exponential linear units (ELUs). arXiv: Learning, 2015.
- Scalable deep generative modeling for sparse graphs. In International Conference on Machine Learning, pp. 2302–2312, 2020.
- The expansion of conservation genetics. Nat. Rev. Genet., 5(9):702–712, September 2004. ISSN 1471-0056. doi: 10.1038/nrg1425. URL http://dx.doi.org/10.1038/nrg1425.
- Probabilistic path Hamiltonian Monte Carlo. In Proceedings of the 34th International Conference on Machine Learning, pp. 1009–1018, July 2017. URL http://proceedings.mlr.press/v70/dinh17a.html.
- Online Bayesian phylogenetic inference: Theoretical foundations via sequential Monte Carlo. Systematic Biology, 67:503 – 517, 2016.
- Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science, January 2021. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.abf2946. URL https://science.sciencemag.org/content/early/2021/01/07/science.abf2946.
- Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature, April 2017. ISSN 0028-0836, 1476-4687. doi: 10.1038/nature22040. URL http://dx.doi.org/10.1038/nature22040.
- J. Felsenstein. Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution, 17:268–276, 1981.
- Joseph Felsenstein. Inferring Phylogenies. Sinauer associates, 2 edition, 2004.
- Molecular evidence for Acanthocephala as a subtaxon of Rotifera. Mol. Evol., 43:287–292, 1996.
- Neural message passing for quantum chemistry. ArXiv, abs/1704.01212, 2017.
- Tetrapod phylogeny inferred from 18S and 28S ribosomal RNA sequences and review of the evidence for amniote relationships. Mol. Biol. Evol., 7:607–633, 1990.
- Laboulbeniopsis termitarius, an ectoparasite of termites newly recognized as a member of the Laboulbeniomycetes. Mycologia, 95:561–564, 2003.
- Guided tree topology proposals for Bayesian phylogenetic inference. Syst. Biol., 61(1):1–11, January 2012. ISSN 1063-5157. doi: 10.1093/sysbio/syr074. URL http://dx.doi.org/10.1093/sysbio/syr074.
- Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 294:2310–2314, 2001.
- Junction tree variational autoencoder for molecular graph generation. ArXiv, abs/1802.04364, 2018.
- Evolution of protein molecules. Mammalian protein metabolism, 3:21–132, 1969.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
- VaiPhy: a variational inference based algorithm for phylogeny. In Advances in Neural Information Processing Systems, 2022.
- Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. Syst. Biol., 57:86–103, 2008.
- Bret Larget. The estimation of tree posterior probabilities using conditional clade probability distributions. Syst. Biol., 62(4):501–511, July 2013. ISSN 1063-5157. doi: 10.1093/sysbio/syt014. URL http://dx.doi.org/10.1093/sysbio/syt014.
- Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Molecular Biology and Evolution, 16:750–750, 1999.
- Learning deep generative models of graphs, 2018. URL https://arxiv.org/abs/1803.03324.
- Efficient graph generation with graph recurrent attention networks. ArXiv, abs/1910.00760, 2019.
- Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics, 55:1–12, 1999.
- Ultrafast approximation for phylogenetic bootstrap. Molecular Biology and Evolution, 30:1188 – 1195, 2013.
- Variational inference for monte carlo objectives. In International Conference on Machine Learning, 2016.
- Variational combinatorial sequential Monte Carlo methods for Bayesian phylogenetic inference. In Conference on Uncertainty in Artificial Intelligence, 2021.
- PyTorch: An imperative style, high-performance deep learning library. In Neural Information Processing Systems, 2019.
- Tighter variational bounds are not necessarily better. In Proceedings of the 36th International Conference on Machine Learning, 2019.
- MrBayes 3.2: Efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic Biology, 61(3):539–542, 2012.
- Molecular studies of the Bionectriaceae using large subunit rDNA sequences. Mycologia, 93:100–110, 2001.
- GraphAF: a flow-based autoregressive model for molecular graph generation. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1esMkHYPr.
- GraphVAE: Towards generation of small graphs using variational autoencoders. In International Conference on Artificial Neural Networks, pp. 412–422, 2018.
- Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, 2017.
- Bayesian phylogenetic inference using a combinatorial sequential Monte Carlo method. Journal of the American Statistical Association, 110:1362 – 1374, 2015. URL https://api.semanticscholar.org/CorpusID:4495539.
- Dynamic graph CNN for learning on point clouds. ACM Transactions on Graphics (TOG), 38:1 – 12, 2018.
- Quantifying MCMC exploration of phylogenetic tree space. Syst. Biol., 64(3):472–491, May 2015. ISSN 1063-5157, 1076-836X. doi: 10.1093/sysbio/syv006. URL http://dx.doi.org/10.1093/sysbio/syv006.
- Z. Yang and A. D. Yoder. Comparison of likelihood and Bayesian methods for estimating divergence times using multiple gene loci and calibration points, with application to a radiation of cute-looking mouse lemur species. Syst. Biol., 52:705–716, 2003.
- Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method. Molecular Biology and Evolution, 14(7):717–724, 1997.
- A. D. Yoder and Z. Yang. Divergence datas for Malagasy lemurs estimated from multiple gene loci: geological and evolutionary context. Mol. Ecol., 13:757–773, 2004.
- Graph convolutional policy network for goal-directed molecular graph generation. In Advances in Neural Information Processing Systems, pp. 6410–6421, 2018a.
- GraphRNN: Generating realistic graphs with deep auto-regressive models. In International Conference on Machine Learning, 2018b.
- Cheng Zhang. Improved variational Bayesian phylogenetic inference with normalizing flows. In Neural Information Processing Systems, 2020.
- Cheng Zhang. Learnable topological features for phylogenetic inference via graph neural networks. In International Conference on Learning Representations, 2023.
- Generalizing tree probability estimation via Bayesian networks. In Neural Information Processing Systems, 2018.
- Variational Bayesian phylogenetic inference. In International Conference on Learning Representations, 2019.
- A variational approach to Bayesian phylogenetic inference, 2022. URL https://arxiv.org/abs/2204.07747.
- N. Zhang and M. Blackwell. Molecular phylogeny of dogwood anthracnose fungus (Discula destructiva) and the Diaporthales. Mycologia, 93:355–365, 2001.