Tree-Valued Markov Processes
- Tree-valued Markov processes are stochastic models defined on tree representations that capture genealogical, phylogenetic, and population structures.
- They utilize mechanisms such as pruning, fragmentation, coalescence, and growth to evolve both discrete and continuum tree models with explicit transition kernels.
- Applications span modeling evolutionary histories and improving computational efficiency via lumpability and convergence in topologies like Gromov–Prohorov.
A tree-valued Markov process is a Markov process whose state space consists of combinatorial, metric, or measure-theoretic representations of trees. Such processes model the stochastic evolution of genealogies, phylogenies, or hierarchically structured populations, and encompass discrete trees (e.g., Galton-Watson or fragmentation trees), continuum random trees (CRTs), and ultrametric measure spaces enriched with additional structure. Recent work unifies diverse constructions involving pruning, fragmentation, coalescence, and metric measure limits, culminating in a rigorous stochastic process theory for evolving random trees.
1. Discrete Tree-Valued Markov Processes
At the discrete level, tree-valued Markov processes include models generated by growth or pruning operations, with trees typically represented as rooted, ordered, or plane trees with finite or countable vertex sets. Notable constructions include:
- Pruning of Galton-Watson trees: A stochastic pruning scheme based on independent marks at nodes (with pruning parameter ), where the resulting pruned tree is itself a (modified) Galton-Watson tree. For a fixed Galton-Watson tree , the process is a non-decreasing, time-inhomogeneous Markov process under suitable filtration, with explicit transition kernels and martingale structure for the number of leaves. An analogous process arises by pruning a Kesten tree (the GW tree conditioned to be infinite). The law of at its "ascension time" can be expressed in terms of , facilitating time-reversal and pathwise representation results (Abraham et al., 2010).
- Tree-valued fragmentation and ancestral branching algorithms: Here, the process is built via recursive partitioning (ancestral branching, AB), generating transition kernels on the set of fragmentation trees of a finite set . These kernels are specified as products of partition-valued Markov kernels, yielding explicit, tractable finite-dimensional distributions. Infinite exchangeability and projective consistency are characterized via the exchangeability and consistency of the underlying partition kernels. Extensions include the cut-and-paste (CP) partition algorithm and corresponding stationary measures, as well as associated mass-fragmentation and weighted-tree processes (Crane, 2011).
- Growth models and trickle-down processes: Markov chains governed by incrementally adding leaves or branches, such as the binary search tree, random recursive tree, and nested Chinese restaurant process models. Trickle-down processes formalize a framework where tree growth is viewed as sequential occupation of vertices in a directed acyclic graph, with the state described by the growing down-set. Explicit transition rules (e.g., uniform addition at external nodes) and connections to limit objects (endpoints, probability flows) are provided via the Doob-Martin compactification and Poisson boundary analysis (Evans et al., 2010).
2. Continuum and Metric-Measure Tree-Valued Processes
The scaling limit of discrete tree processes leads to continuum random trees (CRTs) and tree-valued processes in spaces of (ultra)metric measure spaces:
- Tree-valued Markov processes from branching mechanisms: Let be a family of branching mechanisms parameterized by with structure described by the Lévy-Khinchine formula. Lévy CRTs can be pruned by Poisson random measures (skeleton and node marks) parameterized by 0, yielding a decreasing Markov process 1 with values in the Gromov-Hausdorff-Prohorov metric space 2 of measured trees. The process admits a well-defined ascension time where the total mass jumps from infinite to finite, and under regularity, the law at this time coincides (after appropriate conditioning) with a Markov process on the infinite CRT (constructed by similar pruning of a critical Lévy tree). This generalizes the Abraham–Delmas CRT-valued process and encompasses non-shift families of branching mechanisms (Bi et al., 2014).
- Measure-valued and path-valued branching processes: Solution flows of stochastic differential equations (CBI-SDEs) driven by space-time noises produce inhomogeneous, increasing path-valued branching processes with immigration. These flows yield, by marginalization or interpretation, genuine tree-valued or measure-valued Markov processes with explicit local and nonlocal branching mechanisms. In quadratic cases, such flows recover time-reversal duals of Aldous-Pitman fragmentation-coalescent processes, while general structures unify and extend continuum tree-valued processes (Li, 2012).
3. Tree-Valued Processes from Population Models
Tree-valued Markov processes emerge as limits or functional enrichments of classical and generalized stochastic population models:
- Tree-valued Fleming-Viot and Cannings models: The enrichment of the Fleming-Viot measure-valued diffusion by genealogical relations yields a tree-valued Markov process (TFVMS) with state space the marked Gromov-weak topology on ultrametric measure spaces. The generator incorporates growth (branch-length), resampling, mutation, and selection operators, with path properties established via well-posed martingale problems and duality. The process admits scaling limits from finite-population Moran or Cannings models, establishes ergodicity, and provides Laplace transforms for genealogical distances in equilibrium under selection and mutation (Depperschmidt et al., 2011, Gufler, 2016).
- Invariance principles and Gromov-Prohorov topology: Sequences of tree-valued Markov chains (e.g., genealogies in Cannings models) converge in distribution to tree-valued Fleming–Viot processes under Möhle–Sagitov conditions. The topology is typically the Gromov–Prohorov or Gromov–weak topology on isometry classes of ultrametric metric measure spaces. Generators and polynomial test functions are described in detail, with explicit formulations for growth and resampling, and extensions to marked and dust-carrying cases (Gufler, 2016).
4. Lumpability, Projections, and Accelerated Computation
In practical and statistical settings, it is essential to consider whether projections of tree state spaces preserve, approximately or exactly, the Markov property:
- Exact and 3-lumpability: For tree-valued Markov chains on spaces such as the subtree-prune-regraft (SPR) graph over binary phylogenetic trees, the projected chain on tree shapes (unlabeled) is exactly lumpable—transition probabilities aggregate appropriately, yielding a Markov chain over the much smaller space of shapes. For clade-indicator processes (presence of specific clades), the chain is only 4-lumpable, with the error controlled and vanishing for fixed clade size as 5. This allows construction of auxiliary low-dimensional Markov chains for efficient Monte Carlo estimation of tree-related functionals, drastically reducing computational complexity (Alves et al., 2024).
5. Scaling Limits and Classification of Growth Models
Recent advances rigorously characterize the continuum limits and classification of tree-valued Markov chains:
- Decorated planar real trees and Gromov–Prokhorov limits: Any Markov chain of plane trees with uniform backward dynamics (e.g., Rémy’s algorithm) admits a universal representation as sampling from a "decorated planar real tree," with explicit classification via extremal decompositions and measure-valued limit theorems. Theorem 4.1 (Geldbach) asserts almost sure convergence in the Gromov–Prokhorov topology of appropriately rescaled and trimmed discrete trees to the limiting real tree, unifying combinatorial, probabilistic, and metric approaches to continuum tree-valued processes (Geldbach, 2023).
6. Regenerative Tree Growth and Fragmentation Embeddings
Markovian embedding of fragmenters (processes 6 for subordinators 7) links discrete bead-splitting processes to binary fragmentation chains and continuum random trees:
- Bead-splitting processes and 8-self-similar CRTs: By recursively selecting and splitting atoms (beads) according to embedded fragmenters, one constructs Markov chains on weighted 9-trees that almost surely converge to 0-self-similar CRTs in the Gromov-Hausdorff-Prohorov metric. Exchangeable binary fragmentation processes correspond via unique dislocation measures, linking bead-splitting dynamics to binary fragmentation and size-biased symmetrisation (Pitman et al., 2013).
7. Connections, Extensions, and Future Directions
Tree-valued Markov processes constitute a broad class encompassing fragmentation, coalescence, pruning, and growth models with combinatorial, measure-theoretic, and metric structures. Recent theory unifies these via explicit generator characterizations, martingale problems, invariance principles, and classification theorems. Extensions include the incorporation of spatial motion, selection and mutation mechanisms, spatially interacting genealogies, and pathwise representations with applications to phylogenetics, Bayesian nonparametrics, and statistical inference in high-dimensional stochastic systems. Open problems involve sharp mixing time analysis, optimal summary partitioning, and generalizations to non-exchangeable or temporally inhomogeneous settings (Alves et al., 2024, Li, 2012, Gufler, 2016, Geldbach, 2023).