A Vector Representation for Phylogenetic Trees (2405.07110v1)
Abstract: Good representations for phylogenetic trees and networks are important for optimizing storage efficiency and implementation of scalable methods for the inference and analysis of evolutionary trees for genes, genomes and species. We introduce a new representation for rooted phylogenetic trees that encodes a binary tree on n taxa as a vector of length 2n in which each taxon appears exactly twice. Using this new tree representation, we introduce a novel tree rearrangement operator, called a HOP, that results in a tree space of diameter n and a quadratic neighbourhood size. We also introduce a novel metric, the HOP distance, which is the minimum number of HOPs to transform a tree into another tree. The HOP distance can be computed in near-linear time, a rare instance of a tree rearrangement distance that is tractable. Our experiments show that the HOP distance is better correlated to the Subtree-Prune-and-Regraft distance than the widely used Robinson-Foulds distance. We also describe how the novel tree representation we introduce can be further generalized to tree-child networks.
- Roch S. 2006 A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3, 92–94.
- Prüfer H. 1918 Neuer beweis eines satzes über per mutationen. Archiv der Mathematik und Physik 27, 742–744.
- Rohlf JF. 1983 Numbering binary trees with labeled terminal vertices. Bulletin of Mathematical Biology 45, 33–40.
- St. John K. 2016 Review Paper: The Shape of Phylogenetic Treespace. Systematic Biology 66, e83–e94.