Concise Decision Trees: Methods & Applications

Updated 4 February 2026

Concise decision trees are models that minimize tree depth, node count, or explanation path length while maintaining predictive performance.
They employ methods such as SAT/SMT optimization, heuristic algorithms, and neural techniques to generate compact, interpretable structures.
These trees are applied in controller synthesis, model compression, and interpretable machine learning to reduce computational and communication costs.

A concise decision tree is a decision tree constructed to minimize a formal measure of size—typically tree depth, node count, number of relevant variables, or explanation path length—while preserving predictive accuracy or policy fidelity. Conciseness enables interpretability, efficient inference, low communication cost, and, in control settings, compact policy representations. Recent research has produced a variety of algorithmic, representational, and theoretical techniques for constructing and analyzing concise decision trees across standard classification, surrogate modeling, synthesis for control, and property testing.

1. Formal Definitions and Measures of Conciseness

A binary decision tree over feature set $\mathcal{F}=\{f_0,\ldots,f_{m-1}\}$ and class set $\{0,\ldots,c-1\}$ is defined by labeling each internal node with a feature and each leaf with a class. The canonical conciseness targets include:

Minimum Depth: Find the smallest $k$ such that a depth- $k$ perfect tree (all root-leaf paths of length $k$ ) is consistent with the sample set $\mathcal{E}$ .
Minimum Node Count: Among trees of fixed or minimal depth, find a (possibly imperfect) tree with minimal nodes.
Explanation Path Length: The number of distinct features tested along a root-to-leaf path for any instance; especially relevant for interpretability.
Relevant Variable Count: For Boolean trees, the number of variables that can affect output (each path contains at most one new relevant variable).

The class of size- $s$ decision trees consists of those with at most $s$ leaves and, consequently, at most $s$ relevant variables (Bshouty, 2019).

2. Algorithmic Construction of Concise Trees

SAT-Based and SMT-Based Optimization

Inferring globally optimal decision trees is NP-complete for both depth and node count definitions. Exact construction is dominated by Boolean or SMT encodings that characterize trees of a given shape, as in (Avellaneda, 2019) and (Andriushchenko et al., 17 Jan 2025):

SAT encoding for fixed depth: Each internal node is represented by a unique feature-selection variable; each example traverses a bitwise path to a leaf that must match its class; constraints enforce consistency and optionally bound the number of nodes.
SMT-based policy synthesis: In the context of MDPs, a finite tree template is parameterized with splitting variables and thresholds. SMT constraints ensure that all states are mapped via unique paths to prescribed actions. Minimal depth is found by incrementing depth until a solution exists (Andriushchenko et al., 17 Jan 2025).

Key optimizations include incremental constraint generation (adding only those needed for current counterexamples), unary encoding for node counting, and abstraction-refinement loops to target only realistic policies.

Heuristic Methods

Standard greedy algorithms such as CART and C4.5 produce deep or large trees, especially in high-dimensional settings. Heuristic and mixed-integer programming approaches may offer suboptimal, but practical, size reduction (Avellaneda, 2019). In controller synthesis, techniques such as MaxFreq determinization, and node predicates using arbitrary linear classifiers or SVMs, greatly reduce decision path counts over axis-aligned or oblique splits (Ashok et al., 2020).

Neural and Communication-Based Approaches

Recurrent neural architectures (RDTC) can learn compact or sparse decision trees through iterative binary sub-decisions. The process, mediated by LSTM control and explicit attribute memories, is differentiable during training and distilled into concise symbolic trees at inference, often using an order of magnitude fewer nodes than classical or soft decision tree methods, while retaining high accuracy (Alaniz et al., 2019).

3. Specialized Models for Conciseness and Interpretability

Cascading Decision Trees

Cascading Decision Trees (CDTs) (Zhang et al., 2020) explicitly enforce shallow explanation paths by composing a sequence of bounded-depth subtrees. For any positive instance $x$ , its explanation path is confined to within a subtree, thus achieving explanation length $m \leq D$ , where $D$ is the prescribed maximum depth per subtree. Empirical reduction in explanation length averages 63.38% vs. classical trees, with maintained or improved accuracy and robustness to missing features.

Surrogate and Concept-Based Trees

Surrogate trees for black-box models can be made concise and semantically transparent via feature grouping. Concept-Trees (Renard et al., 2019) partition variables into concepts, using either domain knowledge or automated correlation clustering, and enforce one-concept-per-split rules. This reduces per-node cognitive complexity, especially in high-dimensional, correlated data, without sacrificing surrogate fidelity.

4. Property Testing and Sample Complexity of Concise Trees

Testing whether an unknown function is representable as a concise tree (e.g., size- $s$ ) is possible with near-optimal sample complexity (Bshouty, 2019). For size- $s$ trees, the uniform-distribution tester achieves query complexity $\tilde{O}(s/\epsilon)$ , and the distribution-free tester requires $\tilde{O}(s^2/\epsilon)$ . The testers operate by partitioning features, exploiting the at-most-one-relevant-variable-per-block property, and verifying junta and dictator properties.

Model/Class	Ensured Conciseness Metric	Typical Empirical Complexity
SAT/SMT-encoded trees	Depth or node count (global minimality)	Exponential in depth, polynomial in $n$
CDT	Path length (max $D$ features)	$O(D)$ per explanation; matched accuracy
RDTC	Attribute count, depth	Median $30$–$46$ attributes in final tree
Concept-Tree	Nodes, concept splits	$10$ node budget; high semantic transparency
MaxFreq in dtControl	Decision paths	Single-digit paths for several benchmarks

5. Practical Applications: Control, Interpretable ML, Model Compression

Concise decision trees are essential for applications where model size, explanation length, or transmission cost are constrained:

Controller Synthesis: dtControl can compress lookup policies to compact trees (e.g., 4–6 decision paths for cartpole, reducing orders of magnitude relative to lookup tables), preserving correctness (Ashok et al., 2020).
Optimal MDP Policies: SMT-based approaches synthesize depth- $k$ trees matching up to 99% of reachability optimum with $\sim 1/20$ the nodes of prior methods (Andriushchenko et al., 17 Jan 2025).
Model Interpretation: Concept-Trees and CDTs provide global and local explanations of black-box models with interpretably small, semantically structured splits, directly mitigating the interpretability barrier in high-dimensional, correlated data (Renard et al., 2019, Zhang et al., 2020).

6. Representational Unification and Extensions

Matrix-vector representations reinterpret the entire decision process as shallow binary networks, allowing unification and generalization:

Any classical tree can be expressed by three sequential phases: test ( $h=\sigma(Sx-t)$ ), traversal ( $b=B h$ ), and prediction ( $T(x)=v_i$ at maximizing $i$ ).
Functional leaves, oblique splits, soft traversal (e.g., via softmax), and end-to-end differentiability admit integration with neural and kernel methods (Zhang, 2021).
All classical, oblique, and recurrently generated trees thus fit in a unified framework suitable for further algorithmic innovation.

7. Open Problems and Future Directions

Key bottlenecks and extensions include:

Scalability in High Dimensions: The exponential growth in constraints with feature count $m$ remains a limiting factor for SAT/SMT methods. Dimensionality reduction and sparsity-inducing regularization are active directions (Avellaneda, 2019).
Beyond Boolean Splits: Extending exact and approximate methods to handle numerical splits or richer predicates (e.g., kernelized, nonlinear, or functional splits) is a high-impact target (Ashok et al., 2020, Zhang, 2021).
Compact and Robust Representations: Incorporating tolerance for misclassification, efficient encodings for counting, and seamless integration with local search or gradient-based optimization remain open challenges for extreme conciseness with robustness (Avellaneda, 2019).
Compositional and Modular Explanations: Cascading and concept-based decompositions show that concise structure promotes human usability and algorithmic efficiency; mechanisms to automatically discover such modular structure at scale are not yet fully developed (Zhang et al., 2020, Renard et al., 2019).

Concise decision trees therefore represent both a classical goal and an area of heavy current research, interfacing combinatorial optimization, computational learning theory, property testing, interpretable machine learning, and formal methods. Each domain continues to inspire new techniques for constructing, certifying, and leveraging trees that are as compact as possible given task-specific trade-offs.