Rigorous Explanations for Tree Ensembles

Published 31 Mar 2026 in cs.AI, cs.LG, and cs.LO | (2603.29361v1)

Abstract: Tree ensembles (TEs) find a multitude of practical applications. They represent one of the most general and accurate classes of machine learning methods. While they are typically quite concise in representation, their operation remains inscrutable to human decision makers. One solution to build trust in the operation of TEs is to automatically identify explanations for the predictions made. Evidently, we can only achieve trust using explanations, if those explanations are rigorous, that is truly reflect properties of the underlying predictor they explain This paper investigates the computation of rigorously-defined, logically-sound explanations for the concrete case of two well-known examples of tree ensembles, namely random forests and boosted trees.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces a formal framework for extracting minimal (AXp) and contrastive (CXp) explanations from tree ensembles.
It employs unified propositional logic and incremental MaxSAT to ensure efficient, model-faithful explanation extraction across various ensemble schemes.
Experimental results demonstrate up to 10x runtime speedup and 40-50% reduction in explanation size, validating the practical scalability of the approach.

Rigorous Formal Explanations for Tree Ensembles: A Technical Synthesis

Introduction

The paper "Rigorous Explanations for Tree Ensembles" (2603.29361) advances the formal explainability of ML models constituted by tree ensembles (TEs), specifically random forests (RFs) and gradient-boosted trees (BTs). Despite tree ensembles' robustness and predictive performance in tabular domains, their aggregated structure often obscures decision logic. The work systematically develops unified propositional and MaxSAT-based encodings enabling the extraction of logically rigorous explanations—namely, abductive (AXp) and contrastive (CXp) explanations—aligning with precise model semantics. It extends logical frameworks and proposes scalable encoding strategies that cover classical RF voting, weighted RF variants, and BT schemes, thus providing rigorous guarantees of explanation correctness, minimality, and sufficiency.

Formal Foundations and Complexity

The authors formalize explanations in terms of prime implicant (AXp) and prime implicate (CXp) constructs. An AXp (abductive explanation) is a subset-minimal assignment to feature-value pairs that, when fixed, suffices to entrench the prediction across the TE. A CXp (contrastive explanation) identifies a minimal subset of features whose variation is sufficient to flip the prediction. The paper establishes D-completeness for deciding whether a feature subset is an AXp or a CXp for prediction in a TE, leveraging reductions from DNF prime implicant/implicate complexity [Papadimitriou94]. Even for relatively restrained settings such as RFs with majority voting, the minimal explanation extraction is fundamentally intractable, justifying the focus on practical, optimization-oriented search algorithms.

Unified Logic-Based Encoding for Explanations

A core contribution is the unified propositional framework that supports model-precise reasoning over RFs (both majority and weighted vote) and BTs. The encoding abstracts feature domains into ordered intervals, ensuring domain reduction and facilitating the compact representation of decision paths and leaf assignments. The framework encodes:

TE structure and aggregation using propositional logic and cardinality or pseudo-Boolean constraints.
Prediction verification as (partial weighted) MaxSAT objectives, expressing the target-vs-opponent class score comparison.
Formal entailment of explanations as unsatisfiable subsets (for AXps) or minimal correction subsets (for CXps), connecting explanation extraction directly to core problems in knowledge compilation and MaxSAT optimization.

The encoding encompasses both SAT and SMT paradigms, further accommodating categorical and continuous domains by mapping predicates and domain partitions to Boolean variables within the constraints.

Algorithmic Advancements: MaxSAT and Incremental Solving

Incremental MaxSAT and Core Reuse

The algorithmic innovation includes an incremental MaxSAT solving scheme (Figure 1), in which prior unsatisfiable cores are cached and reused across explanation extraction iterations. This improves empirical performance substantially by avoiding redundant solver instantiations and leveraging learned solver state throughout the deletion-based minimality search. The approach is amenable to assumption-based incremental SAT-style reasoning, further optimizing successive entailment checks during explanation search.

Figure 2: Ablation of incremental MaxSAT, highlighting substantial empirical speedup from core reuse and incremental oracles.

Stratification and Multi-class Heuristics

To mitigate the challenge of diverse and real-valued weights in BTs and weighted RFs, the encoding supports stratified objective functions and variable activity heuristics. A VSIDS-style heuristic for adversarial class ordering—exploiting the temporal locality of winner classes in CXp extraction—reduces the number of MaxSAT oracle calls.

Experimental Evaluation

Scalability of Rigorous Explanations

Extensive experiments across 30+ public datasets demonstrate that, for realistic TE sizes (up to 100 trees/class, depths ≤6), the proposed SAT/MaxSAT encodings produce AXps and CXps in sub-second to few-second timeframes, with outliers corresponding to high-cardinality multiclass settings. For BTs and RFwv (weighted vote), the MaxSAT-based approach significantly outperforms generic SMT-based approaches. The average runtime advantage—often a factor of 5x-10x—is amplified for larger and deeper ensembles as well as for multiclass problems.

Minimal and Succinct Explanations

Rigorous minimality guarantees are enforced in the "minimum CXp" (min-CXp) setting by leveraging MaxSAT optimization. Notable findings (Figure 3, Figure 4, Figure 5):

Minimum-CXps are consistently 40–50% shorter than standard (not necessarily minimal) CXps, with a moderate (3–5x) runtime penalty.
For hard instances (e.g., large multiclass datasets), enforcing strict minimality eliminates long-tailed, less interpretable explanations.

Figure 6: Distribution of CXp vs. minimum-CXp lengths for RFmv, showing a marked reduction in explanation size through minimization.

Figure 7: Distribution of CXp vs. minimum-CXp lengths for RFwv, reflecting the succinctness gained with minimum CXps.

Figure 8: Distribution of CXp vs. minimum-CXp lengths in BTs, confirming consistent gains in compactness.

MaxSAT Backend Evaluation

A comparative analysis between MaxSAT solvers (RC2 and OR-Tools) for handling the pseudo-Boolean constraint optimization in BTs reveals (Figure 9):

OR-Tools generally outperforms RC2, especially for larger, more complex pseudo-Boolean optimization tasks.
RC2 remains competitive for smaller feasible solution checks, confirming its efficiency for incremental entailment tests.

Figure 10: Detailed runtime comparison for minimum-CXp extraction in BTs between RC2 and OR-Tools, demonstrating superior scaling of OR-Tools.

Robustness and Fairness Verification

The unified propositional encoding seamlessly supports robustness (adversarial example) and individual fairness verification queries over TEs, by dualizing the encoding for the original and perturbed/protected feature assignments. Formal verification queries complete in milliseconds, even for large models, indicating that rigorous pipeline integration of explanation and verification is feasible at practical scale.

Implications and Theoretical Insights

The proposed encoding and associated algorithms substantially broaden the tractable domain of logic-based XAI in tree ensembles. The demonstration that MaxSAT-based optimization is scalable and outperforms prior SMT/ILP-based explainers for both AXp and CXp extraction in nontrivial TEs forms a strong empirical result. The explicit D-completeness results reinforce the necessity of such practical optimization techniques, as exhaustive enumeration is infeasible even for moderate-size problems.

The approach also exposes the intractability boundaries of logic-based explanations, as encoding literal count scales exponentially in tree depth, suggesting that ensemble size and depth must be managed for practical deployment. Future theoretical advances may further ameliorate this via compressed knowledge compilation or symmetry-breaking.

Conclusion

This paper systematically delivers formal, model-faithful explanations for a wide spectrum of tree ensemble classifiers, together with efficient algorithms rooted in propositional logic and MaxSAT. The methods yield minimal, provably sufficient explanations and demonstrate both theoretical optimality and practical scalability. The unified encoding and incremental MaxSAT paradigm extend naturally to fairness and robustness verification, setting a robust methodological foundation for rigorous XAI in tree-based models. The techniques outlined here delineate both the current practical frontiers and outline directions for encoding refinement and algorithmic acceleration as TE complexity continues to grow.

Markdown Report Issue