Optimal Generalized Decision Trees via Integer Programming

Published 10 Dec 2016 in cs.LG, math.OC, and stat.ML | (1612.03225v3)

Abstract: Decision trees have been a very popular class of predictive models for decades due to their interpretability and good performance on categorical features. However, they are not always robust and tend to overfit the data. Additionally, if allowed to grow large, they lose interpretability. In this paper, we present a mixed integer programming formulation to construct optimal decision trees of a prespecified size. We take the special structure of categorical features into account and allow combinatorial decisions (based on subsets of values of features) at each node. Our approach can also handle numerical features via thresholding. We show that very good accuracy can be achieved with small trees using moderately-sized training sets. The optimization problems we solve are tractable with modern solvers.

Abstract PDF Upgrade to Chat

Citations (31)

View on Semantic Scholar

Summary

The paper introduces an integer programming formulation to construct decision trees that directly optimize classification accuracy.
It models feature selection and branching as a combinatorial problem, improving both interpretability and performance for categorical datasets.
Experimental results show that these optimal decision trees achieve superior accuracy and better transparency compared to traditional heuristic methods.

Optimal Generalized Decision Trees via Integer Programming

The paper "Optimal Generalized Decision Trees via Integer Programming" (1612.03225) addresses the challenge of constructing decision trees that are optimal in terms of classification accuracy for categorical data by leveraging a mixed integer programming (MIP) approach. Unlike traditional sequential heuristic methods like CART or C4.5, which grow and prune trees using local criteria, this paper formulates the decision tree training problem as a globally optimal integer programming problem that directly optimizes accuracy regarding training data. Moreover, this method accommodates combinatorial decisions at each node, improving interpretability and accuracy, especially when small trees suffice for effective classification.

Integer Programming Approach

The core of the paper's contribution lies in modeling decision tree construction as an integer programming problem, where each decision node in the tree corresponds to an integer variable dictating its branching rule. This approach considers both categorical and numerical features, handling numerical data through thresholding. Key facets of this integer programming formulation include:

Group and Feature Selection: Each categorical feature is represented by a group of binary features, and a group is chosen for branching based on a set of binary decision variables. This allows decision nodes to make complex branching decisions by considering subsets of feature values.
Tree Topology: While the tree topology is predetermined, allowing for experiments with various fixed topologies, selecting branching rules is formulated as optimizing the assignment of these subsets to decision nodes.
Symmetry and Strengthening: Several techniques are presented to enhance the computational efficiency of the MIP, including symmetry breaking by anchoring select features and relaxation of certain binary variables without compromising solution integrity.

Implementation and Scaling

The implementation leverages modern MIP solvers like IBM ILOG CPLEX to handle the optimization problem, proving that optimal small depth decision trees (up to depth 4) can be trained within reasonable timescales. While larger trees pose challenges due to increased computational demands, careful preprocessing and feature selection can ameliorate these issues.

To mitigate overfitting associated with combinatorial branching when categorical features have numerous possible values, constraints limiting the subset cardinality are introduced. These constraints directly influence the generalization capacity of the trees.

Experimental Results

Extensive experiments across various publicly available datasets demonstrate that optimal decision trees (ODTs) consistently achieve comparable or superior accuracy compared to traditional heuristics such as CART, with better interpretability owing to smaller tree sizes. Moreover, the ability to enforce conditions on sensitivity or specificity makes ODTs more adaptable to real-world classification tasks demanding such guarantees, which heuristics typically struggle to provide reliably.

The paper also discusses the practical implications of deploying ODTs, particularly in domains demanding high transparency and human interpretability, such as healthcare diagnostics.

Conclusion

This integer programming method for decision tree construction represents a significant advancement in predictive modeling over heuristic-based approaches, facilitating the creation of highly interpretable and accurate models suited to categorical datasets. The implementation considerations and performance benefits highlighted in this paper provide a robust framework for future developments in interpretable machine learning models. As computational resources evolve and integer programming solvers continue to improve, the scalability and applicability of this approach are likely to broaden, potentially influencing various industry applications where transparency and decision justification are critical.

Markdown