Depth-First Search Decision Tree (DFSDT)

Updated 14 March 2026

DFSDT is a recursive, exact algorithm for constructing binary decision trees by fully exploring one subtree before backtracking.
It employs depth-first search with lower-bound pruning, achieving optimality in the absence of resource constraints and enhancing anytime behavior.
Enhancements like limited discrepancy search and hybrid BFS-DFS strategies improve practical performance, especially for ensemble methods.

A Depth-First Search Decision Tree (DFSDT) is a recursive, exact optimization algorithm for constructing binary decision trees by fully exploring one subtree before proceeding to the next. This methodology underpins state-of-the-art learning of optimal decision trees, especially with continuous input features, and serves as a computational backbone for both single-tree learning and ensemble methods such as random forests. The classical DFSDT framework is characterized by depth-first search traversal, strong optimality guarantees in the absence of resource constraints, and well-understood computational properties. Enhancements using limited discrepancy search (LDS) and hybrid BFS-DFS scheduling address some of its practical shortcomings, such as poor anytime behavior and limited hardware utilization.

1. Formal Specification and Problem Setup

Given a training dataset $\mathcal{D} = \{(x^i, y^i)\}_{i=1}^n$ , where $x \in \mathbb{R}^p$ and $y \in \mathcal{Y} = \{1, \ldots, K\}$ , the task is to construct a binary decision tree $t$ of depth at most $d$ that minimizes empirical risk. Each internal node specifies a feature $f \in \mathcal{F} = \{1, \ldots, p\}$ and a threshold $\tau \in S^f$ , the latter defined as midpoints between sorted unique feature values: $S^f = \{ (U_j^f + U_{j+1}^f)/2 \ | \ j = 1, \ldots, m \}$ . The data is recursively partitioned into $\mathcal{D}_L(f, \tau) = \{(x, y) \in \mathcal{D} : x_f \leq \tau\}$ and $\mathcal{D}_R(f, \tau) = \{(x, y) \in \mathcal{D} : x_f > \tau\}$ .

Empirical loss can be measured using misclassification error, Gini impurity, or entropy:

Misclassification: $x \in \mathbb{R}^p$ 0
Gini: $x \in \mathbb{R}^p$ 1, where $x \in \mathbb{R}^p$ 2 is the count of class $x \in \mathbb{R}^p$ 3
Entropy: $x \in \mathbb{R}^p$ 4

The goal is to find

$x \in \mathbb{R}^p$ 5

where $x \in \mathbb{R}^p$ 6 denotes all trees on $x \in \mathbb{R}^p$ 7 with depth at most $x \in \mathbb{R}^p$ 8 (Kiossou et al., 21 Jan 2026).

2. Pure Depth-First Search Construction

The DFSDT algorithm recursively processes nodes depth-first, prioritizing features and thresholds according to fixed or heuristic orderings (e.g., Gini impurity). At each node with remaining depth $x \in \mathbb{R}^p$ 9, all $y \in \mathcal{Y} = \{1, \ldots, K\}$ 0 combinations are considered:

Recursively solve the left sub-tree: $y \in \mathcal{Y} = \{1, \ldots, K\}$ 1, under the current global upper bound $y \in \mathcal{Y} = \{1, \ldots, K\}$ 2.
Reduce bound for the right sub-tree to $y \in \mathcal{Y} = \{1, \ldots, K\}$ 3 and recurse: $y \in \mathcal{Y} = \{1, \ldots, K\}$ 4.
Update incumbent if $y \in \mathcal{Y} = \{1, \ldots, K\}$ 5 improves upon $y \in \mathcal{Y} = \{1, \ldots, K\}$ 6.
Prune branches if lower bound $y \in \mathcal{Y} = \{1, \ldots, K\}$ 7.

Pseudocode: $f \in \mathcal{F} = \{1, \ldots, p\}$ 3 The worst-case runtime is exponential in $y \in \mathcal{Y} = \{1, \ldots, K\}$ 8: $y \in \mathcal{Y} = \{1, \ldots, K\}$ 9, with recursion stack and data subset storage consuming $t$ 0 memory. In practice, lower-bound pruning and memoization can moderate, but not eliminate, exponential dependency (Kiossou et al., 21 Jan 2026).

3. Analytical and Practical Properties

Complexity and Memory Footprint

In decision forest contexts, assuming a presorted feature matrix ( $t$ 1, $t$ 2) and node-active example buffers, the overall time/space costs can be summarized as: $t$ 3 where $t$ 4 is maximum depth and $t$ 5 number of considered features per node (Anghel et al., 2019). Only one path's buffers are "live" at a time, which minimizes peak memory during single-tree construction.

Cache Behavior and Parallelism

DFS's percolating path traversal leads to pseudo-random memory accesses in the global sorted matrix, producing frequent L2/L3 cache misses (empirical cost $t$ 6 higher than level-order BFS). Intra-tree parallelism is very limited in pure DFS: parallelization is only available across different trees in an ensemble (Anghel et al., 2019).

4. Anytime Behavior and Limitations of Pure DFS

DFSDT exhibits suboptimal anytime behavior. The search fully optimizes along a single branch (typically the heuristic best, e.g., leftmost), leaving other branches unexplored until later. Initial incumbent solutions are highly unbalanced (one deep left sub-tree, single-leaf right branches), resulting in poor intermediate misclassification rates if the search is interrupted. Early-stopping yields solutions that are often inferior to those from greedy methods such as C4.5, which construct more balanced trees early. Quantitative anytime metrics include:

Primal gap: $t$ 7
Primal integral: $t$ 8, where $t$ 9 if no solution, else $d$ 0

Lower $d$ 1 values imply better average anytime solution quality (Kiossou et al., 21 Jan 2026).

5. Limited Discrepancy Search Enhancement

The integration of Limited Discrepancy Search (LDS) addresses the anytime shortcoming by prioritizing exploration of trees close to a strong heuristic (e.g., C4.5). A "discrepancy" is any deviation from the heuristic at feature or split selection: choosing the $d$ 2th feature or split in the heuristic order costs $d$ 3 discrepancies. The algorithm maintains budgets $d$ 4; only nodes within budget are expanded. Budgets are increased gradually in a "diagonal sweep" schedule (sequences such as $d$ 5), initially restricting the search to the heuristic tree and progressively relaxing constraints.

CA-ConTree, the resulting anytime-optimal algorithm, empirically achieves lower primal integrals and produces higher-quality trees throughout the search horizon. Tables 1 and 2 below summarize anytime results (average $d$ 6 over 16 UCI datasets, depths 4 and 5):

Approach	5s	15s	30s	60s	120s	300s	600s
CA-ConTree	33.5	26.5	23.1	20.4	18.3	16.3	14.6
ConTree-Gini	89.1	85.4	77.4	71.8	67.6	60.0	50.5
ConTree	93.8	89.0	83.7	76.8	70.3	62.1	54.2
C4.5	40.8	40.8	40.8	40.8	40.8	40.8	40.8

CA-ConTree consistently outperforms the pure DFS baseline and C4.5 during the anytime window (Kiossou et al., 21 Jan 2026).

6. Hybridization and Hardware-Efficient Strategies

DFSDT's limitations for large-scale and ensemble training, particularly in hardware utilization, have led to hybrid strategies combining BFS and DFS. The breadth-first, depth-next hybrid presorts all features and performs BFS for initial levels, maximizing cache efficiency and parallel compute across all nodes at a given depth. When example buffers for frontier nodes fit into per-core cache, each node switches to independent DFS recursions. The switching criterion is: $d$ 7 Pseudocode snippet: $f \in \mathcal{F} = \{1, \ldots, p\}$ 4 This "breadth-first, depth-next" approach achieves significant empirical speedups ( $d$ 8 to $d$ 9) over widely-used libraries, without accuracy loss (Anghel et al., 2019).

7. Empirical Results and Practical Recommendations

Empirical evaluations on UCI and large synthetic datasets demonstrate:

For medium datasets (depth $f \in \mathcal{F} = \{1, \ldots, p\}$ 0), CA-ConTree with diagonal budgets yields near-optimal and high-quality trees within practical timeouts ( $f \in \mathcal{F} = \{1, \ldots, p\}$ 1– $f \in \mathcal{F} = \{1, \ldots, p\}$ 2 s).
Pure DFS/DFSDT achieves optimality faster for shallow depths but produces poor anytime solutions, highly relevant in time-limited settings.
LDS overhead is justified by the substantial improvement in interim tree quality and anytime solution metrics.
Hardware-optimized hybrids (hybrid BFS-DFS) yield runtime improvements at the ensemble-building scale, essential for practical random forest training.

Test-set accuracy for CA-ConTree matches or surpasses ConTree and C4.5 in most benchmarks, e.g., (depth 5, 600 s timeout):

Dataset	C4.5	ConTree	CA-ConTree
skin	0.984	0.994	0.994
avila	0.620	0.616	0.665

A plausible implication is that the CA-ConTree approach, leveraging LDS atop DFSDT, combines optimality guarantees with strong anytime characteristics, making it suitable for both high-accuracy tree construction and robust real-time applications (Kiossou et al., 21 Jan 2026).

References:

"Anytime Optimal Decision Tree Learning with Continuous Features" (Kiossou et al., 21 Jan 2026)
"Breadth-first, Depth-next Training of Random Forests" (Anghel et al., 2019)

Markdown Report Issue Upgrade to Chat

References (2)

Anytime Optimal Decision Tree Learning with Continuous Features (2026)

Breadth-first, Depth-next Training of Random Forests (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Depth-First Search Decision Tree (DFSDT).