HODT: Hypersurface Optimal Decision Trees

Updated 18 September 2025

HODT is a class of decision tree algorithms that use polynomial hypersurface splits, enabling rich, non-axis-aligned decision boundaries for enhanced model expressivity.
The methodology employs recursive dynamic programming and ancestry filtering to efficiently enumerate and select optimal splitting rules while pruning infeasible candidate trees.
Empirical results show HODT achieves up to 30% higher test accuracy and robust performance under noise, outperforming traditional axis-parallel and hyperplane decision trees.

The Hypersurface Optimal Decision Tree (HODT) refers to a class of decision tree algorithms in which branching decisions are made according to general hypersurface splitting rules—typically defined by polynomial equations of arbitrary degree—instead of the traditional axis-parallel splits or hyperplanes. HODT algorithms rigorously extend optimal decision tree (ODT) methods to richer decision boundaries, enabling sparser, more accurate, and robust models for classification and regression. Recent theoretical and algorithmic advances have yielded the first correct-by-construction, scalable algorithms for learning optimal hypersurface decision trees, with proven empirical advantages over axis-parallel and hyperplane-based ODTs (He, 15 Sep 2025).

1. Formal Definition and Theoretical Foundations

The foundational model for HODT is based on four axioms that ensure tree correctness and uniqueness (He, 14 Sep 2025, He et al., 3 Mar 2025). These require that:

Each internal node splits the feature space into exactly two disjoint, connected regions via a single splitting rule (hypersurface).
Leaf nodes are characterized by the intersection of all ancestor splitting rules.
An ancestry relation between splitting rules is transitive, maintaining feasibility for the recursive tree structure.
The ancestry relation matrix (with entries in {1, –1, 0}) is unique in level-order traversal, guaranteeing a single “proper” decision tree for any valid configuration.

A decision tree that satisfies these axioms is termed a “proper decision tree” (Editor's term). In HODT, splitting rules are multivariate polynomials of degree $M$ ; axis-parallel decision trees are recovered for $M = 1$ and splitting functions of the form $x_j \leq t$ .

2. Algorithmic Development: Recursive and Dynamic Programming Methods

Recent advances have resulted in several formal, executable algorithms for HODT, all rooted in rigorous program derivation (He et al., 3 Mar 2025, He, 14 Sep 2025, He, 15 Sep 2025). The core algorithm proceeds as follows:

Candidate splitting rules (hypersurfaces defined by sampled data points via polynomial interpolation) are enumerated.
Feasible nested combinations of $K$ splitting rules—subject to the ancestry axioms—are generated incrementally, with nonviable configurations (such as those containing crossed hypersurfaces) excluded via prefix-closed filtering.
For each feasible configuration, optimal subtree solutions are constructed via a recursive dynamic programming procedure (“sodt_rec”), minimizing a user-specified objective (often 0-1 classification error) at each node.
In practice, a vectorized (“sodt_vec”) or permutation-based (“sodt_kperms”) generator can be used for parallel and hardware-accelerated implementation, especially when the number of candidate combinations is large.

The critical innovation for HODT is the systematic updating and filtering of ancestry relations as candidate rules are sampled and appended, enabling efficient pruning of infeasible search space branches and guaranteeing correctness-by-construction.

3. Geometric Characterization of Hypersurface Splitting Rules

HODT exploits combinatorial geometric properties of polynomial hypersurfaces to significantly reduce computational complexity (He, 14 Sep 2025, He, 15 Sep 2025). Key properties include:

Each hypersurface is uniquely defined for a finite sample by $G$ data points; $G = D$ for hyperplanes, $G = \binom{D+M}{M}$ for degree- $M$ polynomials in $D$ dimensions.
The Veronese embedding $\rho_M$ maps each hypersurface (degree- $M$ ) to a hyperplane in a higher-dimensional polynomial space, allowing standard linear classification and ancestry analysis.
Two hypersurfaces either exhibit a valid ancestry relation (one is an ancestor of the other), are mutually ancestral, or they cross (no ancestry), determined by the signs of their evaluation on the data points used to define each.
Configurations with crossed hypersurfaces are provably infeasible (cannot yield a valid tree) and pruned at generation.

This geometry enables a significant reduction in the number of candidate trees, as empirical search spaces are orders of magnitude smaller than theoretical upper bounds.

4. Computational and Empirical Results

HODT algorithms have been tested on both synthetic and real-world datasets under rigorous experimental conditions (He, 15 Sep 2025). Key findings:

On synthetic ground-truth hyperplane data (varying tree size $K$ , dimension $D$ , and noise levels), HODT not only recovers the true partitioning rules with higher accuracy than axis-parallel ODTs, but in noisy settings can exceed training accuracy while retaining high generalization.
Parallel implementations based on sodt_vec and GPU acceleration demonstrate sub-quadratic runtimes on large candidate pools due to effective pruning and hardware-friendly recursion.
Five-fold cross-validation on 30 real-world datasets (UCI repository) shows that HODT achieves up to 30% higher test accuracy than state-of-the-art optimal axis-parallel tree methods (e.g., ConTree) when tree size and complexity are controlled.
Comparative analysis reveals that HODT trees are consistently sparser, with more accurate modeling of complex decision boundaries (e.g., convex polygons), and more robust to feature and label noise.

5. Comparison with Prior Decision Tree Models

HODT generalizes and subsumes previous families of optimal tree learning algorithms:

Axis-parallel trees are a special case ( $M=1$ ; splits on $x_j \leq t$ ).
Hyperplane (oblique) trees (linear splits) are strictly less expressive than degree- $M$ hypersurface HODTs.
SAT-based optimal tree learning methods (Avellaneda, 2019) can, in principle, be adapted to hyperplane or hypersurface splits, but their Boolean representations of richer splitting rules impose scalability and expressivity challenges; pseudo-Boolean or SMT formulations may be required.
HODT distinguishes itself by not relying on general-purpose linear or mixed-integer solvers, instead using custom DP recursions and ancestry relation filtering directly integrated into tree generation.

6. Constraints, Acceleration, and Practical Implementations

HODT algorithms are amenable to the integration of common tree-based constraints:

Tree depth, leaf sample size, and model size can be imposed via prefix-closed filtering predicates embedded in the recursive generation process.
Dominance-based thinning techniques allow pruning of configurations that cannot lead to an optimal solution according to monotonicity or pessimistic error bounds, with justified correctness via the underlying algebraic programming theory (He, 14 Sep 2025, He et al., 3 Mar 2025).
Vectorized implementation allows significant parallelization, especially for size-constrained trees, leveraging hardware capabilities for the combinatorially intensive candidate generation phase.

7. Generalization, Robustness, and Future Prospects

Empirical studies establish that HODT maintains excellent generalization performance even under noise and limited data, contrary to typical concerns about model overfitting with more expressive splitting rules. The correct-by-construction DP-based algorithms facilitate reproducible optimal learning and extend naturally to mixed splitting rules (combining axis-parallel, hyperplane, and hypersurface splits).

A plausible implication is that future extensions may unify axis-parallel, oblique, and hypersurface-based splits under a single algorithmic and geometric framework, enabling flexible control over model expressivity and interpretability.

Summary Table: Key Features of HODT Algorithms

Feature	Axis-Parallel ODTs	Hyperplane ODTs	Hypersurface ODTs (HODT)
Splitting Rule	$x_j \leq t$	$\sum w_j x_j \leq \theta$	$P(x) \leq t$ (polynomial)
Search Space Pruning	Depth/dimension limits	Linear geometry	Geometry + ancestry matrix
Optimization Method	SAT, MIP, DP	SAT, Branch & Bound, DP	DP + ancestry filtering
Expressivity	Limited	Moderate	High
Empirical Accuracy	Baseline	Improved	Highest (in controlled size)
Generalization Under Noise	Moderate	Moderate	Robust

HODT algorithms represent a significant advance in optimal decision tree learning by enabling interpretable models with flexible, expressive decision boundaries, rigorous theoretical guarantees, competitive scaling, and empirical performance that surpasses previous state-of-the-art approaches for both axis-parallel and hyperplane-based ODTs (He, 15 Sep 2025, He, 14 Sep 2025, He et al., 3 Mar 2025).