MurTree: Optimal Trees & Topological Analysis

Updated 27 February 2026

MurTree is a collection of dynamic programming techniques for optimal decision tree learning and robust merge tree edit distance computation.
It integrates branch-and-bound search with caching and tight lower bounds to efficiently minimize misclassification costs under depth and node constraints.
In computational topology, MurTree's edit distance framework enhances applications such as shape matching, symmetry detection, and flow summarization.

MurTree refers to two distinct contributions in computational topology and machine learning: (1) a high-performance dynamic programming algorithm for optimal classification tree induction, and (2) an algorithmic framework for computing tree edit distances between merge trees, a key data structure in topological analysis of scalar fields. Both lines of work share a unifying theme of dynamic programming on trees, but are situated in separate research domains. This entry addresses both algorithmic frameworks, with an emphasis on their definitions, core methodologies, mathematical principles, scalability, and empirical performance.

1. MurTree for Optimal Classification Tree Learning

1.1 Problem Definition

The MurTree classification approach addresses the globally optimal learning of decision trees under explicit size and depth constraints. Given a training set $\mathcal{D}$ of $N$ labeled instances with binary labels $\{+1, -1\}$ over a fixed set of binary features $\mathcal{F}$ , the goal is to induce a full binary decision tree $T$ . Each internal node tests a predicate $f \in \mathcal{F}$ , and each leaf predicts $+1$ or $-1$ . The optimization criterion is the minimization of misclassification cost, defined for any leaf containing subset $\mathcal{D}' \subseteq \mathcal{D}$ as:

$\min \left\{ \left| \{ x \in \mathcal{D}': label(x) = +1 \} \right|, \left| \{ x \in \mathcal{D}': label(x) = -1 \} \right| \right\},$

with the cost for the tree given by the sum over all leaves. Optionally, a linear penalty $\alpha \geq 0$ per internal node penalizes tree size. Hard constraints are placed on the maximum tree depth $D$ and the number of internal nodes $n$ (Demirović et al., 2020).

1.2 Dynamic Programming Recurrence

MurTree's core is a dynamic programming (DP) decomposition. Let $T(\mathcal{D}, d, n)$ denote the minimum misclassification cost achievable on data subset $\mathcal{D}$ , using a binary tree of depth at most $d$ and at most $n$ internal nodes. The recurrence is:

$T(\mathcal{D}, d, n) = \begin{cases} T(\mathcal{D}, d, 2^d-1) & \text{if } n > 2^d-1, \ T(\mathcal{D}, n, n) & \text{if } d > n, \ \min\{|\#^+(\mathcal{D})|, |\#^-(\mathcal{D})|\} & \text{if } d=0 \vee n=0, \ \displaystyle \min_{f \in \mathcal{F}, 0 \leq \ell \leq n-1} \Big[ T(\mathcal{D}_{f=0}, d-1, \ell) + T(\mathcal{D}_{f=1}, d-1, n-1-\ell) \Big] & \text{otherwise,} \end{cases}$

with $\mathcal{D}_{f=0}$ and $\mathcal{D}_{f=1}$ denoting instance subsets for feature outcome $0$ and $1$, respectively.

1.3 Branch-and-Bound and Search Pruning

MurTree combines the DP with branch-and-bound search strategies. Key optimization techniques include:

Caching: Every subproblem $T(\mathcal{D}, d, n)$ is memoized using a hash table.
Upper Bound (UB) Pruning: Early discovery of a solution with cost $C^*$ sets $UB \leftarrow C^*-1$ , pruning subtrees with lower bound $>UB$ .
Lower Bounds:
1. Stored-bound: If a call is proven infeasible within $UB$ , $lb(\mathcal{D}, d, n) \gets UB + 1$ (DL8.5-style).
2. Similarity-based lower bound:
$\mathrm{simLB}(\mathcal{D}_\mathrm{new}, \mathcal{D}_\mathrm{old}, d, n) = T(\mathcal{D}_\mathrm{old}, d, n) - |\mathcal{D}_\mathrm{old} \setminus \mathcal{D}_\mathrm{new}|.$

Local refinement:

$\mathrm{locLB}(\mathcal{D}, d, n) = \min_{f, \ell} \bigl( lb(\mathcal{D}_{f=0}, d-1, \ell) + lb(\mathcal{D}_{f=1}, d-1, n-1-\ell) \bigr).$

Degeneracy pruning: Splits that do not partition the data are skipped.
Dynamic node-order: The branch with larger single-leaf cost is searched first, to increase the likelihood of early UB exceedance.

1.4 Constraint Handling

Both the depth $D$ and node count $n$ are integrated into the DP state and recurrences, ensuring that child nodes always obey their respective budget splits: $\ell \leq 2^{d-1}-1$ and $n-1-\ell \leq 2^{d-1}-1$ (Demirović et al., 2020).

1.5 Complexity and Empirical Scalability

The general DP without acceleration is $O(N \cdot |\mathcal{F}| \cdot n \cdot d)$ . An optimized depth-2 specialization (using precomputed feature and pairwise-feature frequency tables) reduces major subcalls by $10\times$ – $100\times$ and handles the majority of subproblems in constant time. Aggressive caching and tight lower bounds empirically prune $99\%$ of the search space. On standard UCI/C4.5 benchmarks (depth-4 trees), MurTree can solve all tasks within $<60$ s (often $<1$ s), outperforming DL8.5 by up to two orders of magnitude (Demirović et al., 2020).

2. MurTree Tree Edit Distance for Merge Trees

2.1 Merge Trees: Background and Definition

Merge trees encode the evolution of connected components in the sublevel (join tree) or superlevel (split tree) sets of a scalar field $f: X \to \mathbb{R}$ . Each node corresponds to a critical point, and edges to merging events as the level set parameter increases. Nodes are labeled with scalar values (birth/death times in persistence). Merge trees are rooted, with an explicit binary structure reflecting topological evolution (Sridharamurthy et al., 2022).

2.2 Edit Distance: Cost Model

The MurTree edit distance is a minimum-cost sequence of node-wise edit operations (insertions, deletions, relabelings) aligning trees $T_1$ and $T_2$ . Node costs use a metric $\gamma$ :

Deletion: $\gamma(p \to \varnothing) = \frac{1}{2}(d_p - b_p)$
Insertion: $\gamma(\varnothing \to q) = \frac{1}{2}(d_q - b_q)$
Relabel: $\gamma(p \to q) = \min\{ \max(|b_p - b_q|, |d_p - d_q|), \frac{1}{2}(d_p - b_p) + \frac{1}{2}(d_q - b_q) \}$

where $b_p, d_p$ are birth and death for node $p$ . The overall tree edit distance is computed as the minimal total operation cost under valid node matchings respecting tree structure (Sridharamurthy et al., 2022).

2.3 Dynamic Programming Algorithm

The MurTree DP algorithm extends Zhang’s unordered-tree edit distance. Trees $T_1$ and $T_2$ are traversed in postorder. DP tables $D[i, j]$ store optimal edit cost from the subtree rooted at node $i$ in $T_1$ to subtree $j$ in $T_2$ . The core recurrences comprise:

Base cases for deletion/insertion of whole subtrees.
For nontrivial subtrees $(i, j)$ $(i, j)$ , three strategies:
1. Delete $i$ and optimally match its children's forests.
2. Insert $j$ and optimally match its children's forests.
3. Match roots, then solve a bipartite matching problem on children.

Forest matching uses the Hungarian method (or variants) with time $O((\Delta_1 + \Delta_2)^3)$ per match, $\Delta$ being max degree. Total time is $O(n_1 n_2 (\Delta_1 + \Delta_2)^3)$ for trees with $n_1, n_2$ nodes (Sridharamurthy et al., 2022).

2.4 Implementation and Optimization

To enhance stability and performance, several optimizations are applied:

Small persistence intervals $(<\epsilon)$ are merged for robustness.
Implementation caches DP table entries and recycles solutions for isomorphic subtrees.
Parallelization across independent DP blocks enables significant multicore speedups.

2.5 Applications and Experimental Highlights

The MurTree edit distance demonstrates utility in a range of topological data analysis (TDA) tasks:

Periodicity detection: Outperforms bottleneck and Wasserstein distances in temporal periodicity discovery in vortex street simulations.
Stability with respect to smoothing/subsampling: Maintains monotonicity except in degenerate barcode scenarios.
Symmetry detection: Detects group equivalence in synthetic and cryo-EM datasets, with block-diagonal distance matrices.
Shape matching: Clusters pose-varying meshes by class in TOSCA datasets, insensitive to pose changes.
Flow summarization: Segments temporal regimes in 3D flow simulations.

Pairwise distance computation for $~60$ -node trees over $10^6$ pairs completes within practical timeframes (e.g., $25$ minutes on 8-core hardware; $4\times$ – $8\times$ acceleration via optimized code) (Sridharamurthy et al., 2022).

3. Realization of Merge Trees and Discrete Morse Functions

Merge trees can be realized via discrete Morse functions on trees and, notably, on paths. Each abstract merge tree corresponds to a discrete Morse function (critical-only, possibly index-ordered or sublevel-connected) on a path, and vice versa, modulo natural equivalence relations (symmetry, shuffle, or component–merge equivalence). These constructions enable explicit and bijective correspondence between merge trees and discrete Morse function classes, underpinning combinatorial and algorithmic analysis (Brüggemann, 2021).

4. Comparative Analysis with Existing Methods

MurTree for optimal classification trees substantially outperforms prior state-of-the-art solvers (notably DL8.5) in both runtime and scalability. For $d=4$ optimal trees on over $80$ UCI/C4.5 datasets, MurTree achieves solution times $10\times$ – $100\times$ lower, solves all datasets where others time out, and scales linearly with dataset size up to $N=40,000$ (Demirović et al., 2020).

In topological analysis, the MurTree edit metric offers richer discrimination of merge tree structure than bottleneck or $W_1$ distances. It is robust to function perturbations and supports nuanced applications such as symmetry detection and fine-grained shape clustering (Sridharamurthy et al., 2022).

5. Summary of Key Properties and Implications

MurTree constitutes a class of dynamic-programming based algorithms for tree-structured problems:

Domain	Objective	Key Feature
Classification	Exact decision tree optimization	Primal DP with branch-and-bound, tight lower bounds
TDA/Topology	Merge-tree edit distance	Metric cost aligning trees and persistence intervals

Both algorithms leverage structural decomposability of trees, advanced memoization, problem-specific lower bounds, and efficient matching strategies. MurTree's solution paradigms enable handling of large-scale, high-dimensional datasets and precise topological summaries. These developments comprise critical advances towards tractable, exact combinatorial learning models and robust metrics for geometric and topological data analysis (Demirović et al., 2020, Sridharamurthy et al., 2022, Brüggemann, 2021).

Markdown Report Issue Upgrade to Chat

References (3)

MurTree: Optimal Classification Trees via Dynamic Programming and Search (2020)

Edit Distance between Merge Trees (2022)

On Merge Trees and Discrete Morse Functions on Paths and Trees (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MurTree.

MurTree: Optimal Trees & Topological Analysis

1. MurTree for Optimal Classification Tree Learning

1.1 Problem Definition

1.2 Dynamic Programming Recurrence

1.3 Branch-and-Bound and Search Pruning

1.4 Constraint Handling

1.5 Complexity and Empirical Scalability

2. MurTree Tree Edit Distance for Merge Trees

2.1 Merge Trees: Background and Definition

2.2 Edit Distance: Cost Model

2.3 Dynamic Programming Algorithm

2.4 Implementation and Optimization

2.5 Applications and Experimental Highlights

3. Realization of Merge Trees and Discrete Morse Functions

4. Comparative Analysis with Existing Methods

5. Summary of Key Properties and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

MurTree: Optimal Trees & Topological Analysis

1. MurTree for Optimal Classification Tree Learning

1.1 Problem Definition

1.2 Dynamic Programming Recurrence

1.3 Branch-and-Bound and Search Pruning

1.4 Constraint Handling

1.5 Complexity and Empirical Scalability

2. MurTree Tree Edit Distance for Merge Trees

2.1 Merge Trees: Background and Definition

2.2 Edit Distance: Cost Model

2.3 Dynamic Programming Algorithm

2.4 Implementation and Optimization

2.5 Applications and Experimental Highlights

3. Realization of Merge Trees and Discrete Morse Functions

4. Comparative Analysis with Existing Methods

5. Summary of Key Properties and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research