Measuring Tree Balance with Normalized Tree Area (2008.12867v1)
Abstract: The study of species organization and their clustering by genetic or phenotypic similarity is carried out with the tools of phylogenetic trees. An important structural property of phylogenetic trees is the balance, which measures how taxa are distributed among clades. Tree balance can be measured using indices such as the Sackin ($S$) and the Total Cophenetic ($\Phi$), which are based on the distance between nodes of the tree and its root. Here, we propose a new metric for tree balance, $\bar{d}$, the Area per Pair (APP) of the tree, which is a re-scaled version of the so called tree area. We compute $\bar{d}$ for the rooted caterpillar and maximally balanced trees and we also obtain exact formulas for its expected value and variance under the Yule model. The variance of APP for Yule trees has the remarkable property of converging to an asymptotic constant value for large trees. We compare the Sackin, Total Cophenetic and APP indices for hundreds of empirical phylogenies and show that APP represents the observed distribution of tree balances better than the two other metrics.