Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Phylogenetic tree distance computation over succinct representations (2312.14029v1)

Published 21 Dec 2023 in cs.DS

Abstract: There are several tools available to infer phylogenetic trees, which depict the evolutionary relationships among biological entities such as viral and bacterial strains in infectious outbreaks, or cancerous cells in tumor progression trees. These tools rely on several inference methods available to produce phylogenetic trees, with resulting trees not being unique. Thus, methods for comparing phylogenies that are capable of revealing where two phylogenetic trees agree or differ are required. An approach is then to compute a similarity or dissimilarity measure between trees, with the Robinson- Foulds distance being one of the most used, and which can be computed in linear time and space. Nevertheless, given the large and increasing volume of phylogenetic data, phylogenetic trees are becoming very large with hundreds of thousands of leafs. In this context, space requirements become an issue both while computing tree distances and while storing trees. We propose then an efficient implementation of the Robinson-Foulds distance over trees succinct representations. Our implementation generalizes also the Robinson-Foulds distances to labelled phylogenetic trees, i.e., trees containing labels on all nodes, instead of only on leaves. Experimental results show that we are able to still achieve linear time while requiring less space. Our implementation is available as an open-source tool at https://github.com/pedroparedesbranco/TreeDiff.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Inferring phylogenies; Vol. 2, Sinauer associates Sunderland, MA, 2004.
  2. Practical performance of tree comparison metrics. Systematic Biology 2015, 64, 205–214.
  3. Twist–rotation transformations of binary trees and arithmetic expressions. Journal of Algorithms 1999, 32, 155–166.
  4. Subtree transfer operations and their induced metrics on evolutionary trees. Annals of combinatorics 2001, 5, 1–15.
  5. On the computational complexity of the rooted subtree prune and regraft distance. Annals of combinatorics 2005, 8, 409–423.
  6. Comparison of phylogenetic trees. Mathematical Biosciences 1981, 53, 131–147.
  7. Comparison of weighted labelled trees. In Proceedings of the Combinatorial Mathematics VI: Proceedings of the Sixth Australian Conference on Combinatorial Mathematics, Armidale, Australia, August 1978. Springer, 2006, pp. 119–126.
  8. Day, W.H. Optimal algorithms for comparing trees with labeled leaves. Journal of classification 1985, 2, 7–28.
  9. Efficiently computing the Robinson-Foulds metric. Journal of computational biology 2007, 14, 724–735.
  10. A Linear Time Solution to the Labeled Robinson–Foulds Distance Problem. Systematic Biology 2022, 71, 1391–1403.
  11. The EnteroBase user’s guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity. Genome Research 2020, 30, 138–152.
  12. Navarro, G. Compact data structures: A practical approach; Cambridge University Press, 2016.
  13. Distance-based phylogenetic inference from typing data: a unifying view. Briefings in Bioinformatics 2021, 22, bbaa147.
  14. Phylogenetic networks: concepts, algorithms and applications; Cambridge University Press, 2010.
  15. A Robinson-Foulds measure to compare unrooted trees with rooted trees. In Proceedings of the International Symposium on Bioinformatics Research and Applications. Springer, 2012, pp. 115–126.
  16. Comparison of weighted labelled trees. In Combinatorial mathematics VI; Springer, 1979; pp. 119–126.
  17. Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach. BMC bioinformatics 2009, 10, 1–15.
  18. GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens. Genome research 2018, 28, 1395–1404.
  19. Fully Functional Static and Dynamic Succinct Trees. ACM Trans. Algorithms 2014, 10. https://doi.org/10.1145/2601073.
  20. From Theory to Practice: Plug and Play with Succinct Data Structures. In Proceedings of the 13th International Symposium on Experimental Algorithms, (SEA 2014), 2014, pp. 326–337.
  21. Valgrind: A program supervision framework. Electronic notes in theoretical computer science 2003, 89, 44–66.

Summary

We haven't generated a summary for this paper yet.