Ranking top-k trees in tree-based phylogenetic networks

Published 29 Apr 2019 in math.CO, cs.DM, and q-bio.PE | (1904.12432v1)

Abstract: 'Tree-based' phylogenetic networks proposed by Francis and Steel have attracted much attention of theoretical biologists in the last few years. At the heart of the definitions of tree-based phylogenetic networks is the notion of 'support trees', about which there are numerous algorithmic problems that are important for evolutionary data analysis. Recently, Hayamizu (arXiv:1811.05849 [math.CO]) proved a structure theorem for tree-based phylogenetic networks and obtained linear-time and linear-delay algorithms for many basic problems on support trees, such as counting, optimisation, and enumeration. In the present paper, we consider the following fundamental problem in statistical data analysis: given a tree-based phylogenetic network $N$ whose arcs are associated with probability, create the top-$k$ support tree ranking for $N$ by their likelihood values. We provide a linear-delay (and hence optimal) algorithm for the problem and thus reveal the interesting property of tree-based phylogenetic networks that ranking top-$k$ support trees is as computationally easy as picking $k$ arbitrary support trees.