Foundational theory for optimal decision tree problems. II. Optimal hypersurface decision tree algorithm (2509.12057v1)

Published 15 Sep 2025 in cs.LG, cs.DM, and cs.DS

Abstract: Decision trees are a ubiquitous model for classification and regression tasks due to their interpretability and efficiency. However, solving the optimal decision tree (ODT) problem remains a challenging combinatorial optimization task. Even for the simplest splitting rules--axis-parallel hyperplanes--it is NP-hard to optimize. In Part I of this series, we rigorously defined the proper decision tree model through four axioms and, based on these, introduced four formal definitions of the ODT problem. From these definitions, we derived four generic algorithms capable of solving ODT problems for arbitrary decision trees satisfying the axioms. We also analyzed the combinatorial geometric properties of hypersurfaces, showing that decision trees defined by polynomial hypersurface splitting rules satisfy the proper axioms that we proposed. In this second paper (Part II) of this two-part series, building on the algorithmic and geometric foundations established in Part I, we introduce the first hypersurface decision tree (HODT) algorithm. To the best of our knowledge, existing optimal decision tree methods are, to date, limited to hyperplane splitting rules--a special case of hypersurfaces--and rely on general-purpose solvers. In contrast, our HODT algorithm addresses the general hypersurface decision tree model without requiring external solvers. Using synthetic datasets generated from ground-truth hyperplane decision trees, we vary tree size, data size, dimensionality, and label and feature noise. Results showing that our algorithm recovers the ground truth more accurately than axis-parallel trees and exhibits greater robustness to noise. We also analyzed generalization performance across 30 real-world datasets, showing that HODT can achieve up to 30% higher accuracy than the state-of-the-art optimal axis-parallel decision tree algorithm when tree complexity is properly controlled.

Summary

The paper introduces the Hypersurface Optimal Decision Tree (HODT) algorithm to enable complex hypersurface splits without external solvers.
It reduces computational complexity using crossed-hyperplane filtering and dynamic ancestry relation matrix updates.
Experimental results demonstrate up to a 30% accuracy improvement on both synthetic and real-world datasets compared to traditional methods.

Optimal Hypersurface Decision Tree Algorithm

Introduction

The second paper in the series introduces the Hypersurface Optimal Decision Tree (HODT) algorithm, building on the foundational theory and algorithmic insights from the first paper. The paper addresses the limitations of existing ODT methods which predominantly rely on hyperplane splitting and introduces a novel approach that efficiently handles hypersurface splits without external solvers.

Algorithm Design

The HODT algorithm is designed to solve the optimal decision tree problem by allowing more complex decision boundaries defined by hypersurfaces. This is achieved through several key components:

Crossed-Hyperplane Filtering: An important aspect of the algorithm is the recognition and filtering out of configurations containing crossed hyperplanes, which cannot form proper decision trees. This reduces the combinatorial complexity of the problem.
Ancestry Relation Matrix: The algorithm maintains an ancestry relation matrix to keep track of the hierarchical relationships between hyperplanes during tree construction. Incremental updates to this matrix are performed as new hyperplanes are considered.
Incremental Generation: The use of incremental generation techniques allows the ancestry relation matrix to be updated dynamically, enabling efficient exploration of the decision tree search space without redundant computations.

Implementation Details

def hodt(xs, K, M):
    # xs: input data, K: tree size, M: degree of polynomial hypersurface
    embed_xs = embed(xs, M)  # Map data to higher dimension
    ncss = initialize_nested_combs()  # Initialize nested combinations
    
    for n in range(len(xs)):
        # Generate and update nested combinations sequentially
        css, asgn_plus, asgn_minus = update_combinations(embed_xs, n)
        
        # Filter and generate ancestry relation matrices
        ncss = update_ancestry_matrices(css, ncss, asgn_plus, asgn_minus)
    
    # Evaluate and select the optimal decision tree
    best_tree = evaluate_and_select(ncss, K, xs)
    return best_tree

Experimental Evaluation

The HODT algorithm demonstrates superior performance on synthetic and real-world datasets compared to axis-parallel tree models. Experiments reveal that HODT can recover ground truth decision trees more accurately even in noisy data conditions, achieving better generalization on both synthetic and 30 real-world datasets, with improvements in accuracy up to 30% over state-of-the-art axis-parallel methods.

Implications and Future Directions

The introduction of HODT opens new possibilities for using decision trees in more complex and high-dimensional settings. Future research could explore:

Extensions to Random Forests: Incorporating hypersurface decision trees in ensemble methods like random forests.
Handling Categorical Data: Developing methods to efficiently handle categorical data in the context of hypersurface decision trees.
Mixed Splitting Rules: Expanding the algorithm to support trees with a mix of axis-parallel, hyperplane, and hypersurface splits.

In conclusion, HODT represents a significant advancement in decision tree algorithms, providing a framework that is both more flexible and powerful for tackling real-world machine learning problems.