The total path length of split trees

Published 12 Feb 2011 in math.PR, cs.DS, and math.CO | (1102.2541v3)

Abstract: We consider the model of random trees introduced by Devroye [SIAM J. Comput. 28 (1999) 409-432]. The model encompasses many important randomized algorithms and data structures. The pieces of data (items) are stored in a randomized fashion in the nodes of a tree. The total path length (sum of depths of the items) is a natural measure of the efficiency of the algorithm/data structure. Using renewal theory, we prove convergence in distribution of the total path length toward a distribution characterized uniquely by a fixed point equation. Our result covers, using a unified approach, many data structures such as binary search trees, m-ary search trees, quad trees, median-of-(2k+1) trees, and simplex trees.