Model-Based Tree Surrogates Overview
- Model-based tree surrogates are interpretable models that partition the input space and attach simple local models to approximate complex teacher models.
- They enable both global and local distillation by efficiently computing conditional expectations and extracting interpretable rules for various applications.
- The approach balances fidelity and interpretability through recursive partitioning and tuning of local model complexity, enhancing tasks like optimization and uncertainty quantification.
Model-based tree surrogates are a class of surrogate models that combine interpretable partitioning structures (typically decision trees or variants thereof) with explicit statistical, functional, or policy models defined on each region of the partition. Their primary roles are global and local distillation of complex “teacher” models such as ensembles, neural networks, or black-box optimization routines; efficient computation of conditional quantities (such as conditional expectations needed in Shapley value calculations); and interpretable rule extraction for prediction, optimization, preference learning, or uncertainty quantification. A central property is that each region of the partition (leaf or node) is coupled with a simple model (e.g., additive linear regression, polynomial chaos expansion, rule-based prescription, or local value function), resulting in a piecewise structured surrogate whose complexity and interpretability can be naturally traded off.
1. Formal Structure and General Principles
A model-based tree surrogate consists of a recursive partitioning of the input space, each region described (often but not exclusively) by axis-aligned (or, in some cases, oblique) threshold rules. For a given teacher model , the surrogate is defined as:
where is a simple, interpretable model restricted to (Herbinger et al., 2023, Zhou et al., 2022). Frequently, takes the form of:
- additive linear or lasso regression,
- generalized additive models (GAMs),
- simple policies/actions (in policy trees),
- polynomial expansions (in Tree-PCE),
- prescriptive rules for optimization.
All model-based tree surrogates share a characteristic recursive learning algorithm:
- Fit a local model to the data or teacher-evaluations within the current node/region.
- Evaluate split criteria reflecting fidelity (fit to black-box output or true loss), parameter/structure instability, or user-defined improvement.
- If a beneficial split is found (by a chosen criterion, e.g., reduction in sum of squares), partition the region and recurse (subject to stopping rules on tree depth, node purity, minimum node size, or cost-complexity).
- Otherwise, set the current region as a leaf.
The functional form of and the split criteria differentiate various approaches (see Section 3).
2. Major Model-Based Tree Surrogate Methodologies
2.1 Surrogate Trees for Distillation and Explanation
Surrogate trees are employed to interpret complex models by globally mimicking prediction surfaces of ensembles, neural networks, or other black-boxes:
- Global distillation: A single surrogate tree is trained to minimize the prediction error over a sample or domain, covering either regression or classification (Herbinger et al., 2023, Teodoro et al., 2023, Hara et al., 2016).
- Local surrogate trees: For specific prediction points, local trees explain prediction logic (e.g., local SHAP explanations) (Zhou et al., 2022).
- Model-based partitioning: At each node/leaf, additive, linear, or low-complexity models are fit (e.g., SLIM, MOB, CTree, GUIDE) (Herbinger et al., 2023).
- Rule extraction: Oblique/hyperplane splits are found via mixed-integer programming for interpretable global surrogates (MIRET) (Teodoro et al., 2023).
2.2 Tree-Based Surrogates in Optimization
In operations research and optimization, model-based tree surrogates encode instance-to-solution mappings:
- Micro-solution trees: Each leaf encodes a single feasible solution. The tree maps problem instance features to the solution prescription (Goerigk et al., 2024, Goerigk et al., 2024).
- Feature-based/meta-solution trees: Each leaf specifies a region in solution-feature space (e.g., budget distribution per item group), prescribing sets of actions subject to shared structure (Goerigk et al., 2024).
- Robust surrogates: The training integrates explicit budgeted uncertainty sets, yielding trees that are interpretable and robust to adversarial or statistical perturbations (Goerigk et al., 2024).
2.3 Trees as Surrogate Models in Bayesian and Sequential Optimization
Trees (or ensembles thereof) serve as surrogates in sequential or Bayesian optimization, especially when the objective is expensive, high-dimensional, or non-smooth:
- Ensembles for uncertainty estimation: Random Forests, Extremely Randomized Trees, or specifically designed ensembles (BwO forest) provide not only mean predictions but also variance estimates used in acquisition functions such as Expected Improvement (Kim et al., 2022).
- Structure-aware surrogates: Block-structured (e.g., layer-wise) surrogates for quantum circuit optimization further factorize the parameter space and accelerate learning (DiBrita et al., 30 Sep 2025).
- Preference-based surrogate trees: Probabilistic decision tree surrogates for utility learning from pairwise comparisons in Preferential Bayesian Optimization (PBO) (Leenders et al., 16 Dec 2025).
2.4 Surrogate Trees for Shapley Value and SHAP Computation
Model-based trees have been applied for scalable computation of both global Shapley values and local SHAP explanations. By constructing a single (typically shallow) surrogate tree over black-box outputs and modeling conditional expectations through additive models in the leaves, accurate and computationally efficient approximations are achieved. In the MBT (Surrogate-Model-Based Tree) approach, the path probabilities needed for conditional expectation are handled via dedicated local classifiers, resolving the path-dependence issue present in classical Tree SHAP (Zhou et al., 2022).
2.5 Specialized Model Trees
- Gradient-based split model trees: Model trees employing explicit parametric models at leaves and gradient-based criteria to find optimal splits, thus improving predictive power while maintaining transparency via shallow trees (Broelemann et al., 2018).
- Concept-based surrogate trees: High-level grouping of features into “concepts” drives partitioning, yielding global and local explanations at the semantic group level instead of at the raw feature level (Renard et al., 2019).
3. Representative Algorithms and Variants
The following table summarizes representative algorithms within the model-based tree surrogates paradigm:
| Method | Partitioning Mechanism | Leaf Model/Prescription | Targeted Use Case |
|---|---|---|---|
| SLIM | Exhaustive split to minimize SSE | Additive linear | Model distillation (Herbinger et al., 2023) |
| MOB | M-fluctuation test for instability | Linear regression | Stability in model trees |
| GUIDE | χ²-curvature & interaction tests | Additive linear | Interaction detection |
| Robust Tree Surrogates | MIP/Scenario generation | Prescribed solution (x or feature-vector) | Optimization under uncertainty (Goerigk et al., 2024, Goerigk et al., 2024) |
| BwO Forest | Bagging with oversampling, random splitting | Leaf-wise mean/variance | Bayesian/sequential optimization (Kim et al., 2022) |
| MBT (SHAP) | Single global tree, path probabilities | GAM per leaf | Efficient Shapley computation (Zhou et al., 2022) |
| MIRET | MILP for oblique splits | Linear/constant per leaf | Tree ensemble distillation (Teodoro et al., 2023) |
| Concept Tree | Concept-based greedy clustering | Info-gain split (per concept) | Semantically interpretable surrogates (Renard et al., 2019) |
| Decision Tree PBO | Consistency-based splits, Laplace approx. | Gaussian distribution per leaf | Preference learning (Leenders et al., 16 Dec 2025) |
| Tree-PCE | Adaptive partition, greedy TSE-gain | Local polynomial chaos expansion | Surrogate modeling + sensitivity analysis (Said et al., 16 Sep 2025) |
Each algorithm involves tuning complexity (e.g., tree depth, region size, or polynomial degree), and employs principled split or regularization strategies to achieve the interpretability/fidelity trade-off suited to the application domain.
4. Interpretability, Fidelity, Fairness, and Other Metrics
Key trade-offs and evaluation metrics for model-based tree surrogates are:
- Fidelity: Quantified as MSE or (regression) or classification accuracy relative to the teacher or ground truth; fidelity-interpretability trade-off curves are standard (Herbinger et al., 2023, Hara et al., 2016, Teodoro et al., 2023).
- Interpretability: Measured by tree depth, number of leaves or regions, number of rules, number of splits per rule, and semantic transparency (e.g., whether features or concepts are grouped meaningfully) (Herbinger et al., 2023, Teodoro et al., 2023, Renard et al., 2019).
- Stability: Bootstrap-resampled trees compared using the Rand Index; deeper trees or greedy splitters (SLIM, GUIDE) may be less stable (Herbinger et al., 2023).
- Fairness: Fair Feature Importance Scores (FairFIS) reflect the contribution of each splitting variable to reductions or increases in group bias (e.g., demographic parity or equality of opportunity), calculable on both native tree surrogates and ensemble surrogates (Little et al., 2023).
- Robustness: For optimization surrogates, robust in-sample and out-of-sample costs under specified uncertainty sets (e.g., budgeted perturbations) are reported (Goerigk et al., 2024).
- Sensitivity: For Tree-PCE and similar surrogates, global and tree-based sensitivity indices (e.g., Sobol', TSE-gain indices) quantify attribution of prediction variance to different inputs (Said et al., 16 Sep 2025).
Complexity management—through thresholding subsets (e.g., in SHAP computation), adding penalties on feature or rule use, or fixing tree structure—is integral to maintain model transparency and computational feasibility as dimensionality increases.
5. Algorithmic Advances and Efficiency Trade-offs
Surrogate trees are subject to several design and computational considerations:
- Scalability: Ensemble-based surrogates (e.g., Random Forests, BwO forests) allow efficient uncertainty estimation in high dimensions; MIP-based tree optimization may be limited to shallow or small trees, but heuristics (tree, sol, alternation) scale to larger instances (Goerigk et al., 2024, Teodoro et al., 2023).
- Approximation accuracy: Methods such as MBT for SHAP obtain relative errors even in highly correlated regimes, outperforming standard marginals or Tree SHAP (Zhou et al., 2022).
- Domain adaptation: Decision tree surrogates can be extended to handle categorical data, transfer across users (in preference learning), and adapt solution prescriptions using meta-solution representations (Leenders et al., 16 Dec 2025, Goerigk et al., 2024).
6. Empirical Findings and Applications
Extensive empirical testing across multiple domains demonstrates:
- Supervised learning: Model-based tree surrogates matching teacher model fidelity with shallow trees, high interpretability, and—in many cases—substantial reduction in required features or rules (Herbinger et al., 2023, Hara et al., 2016, Teodoro et al., 2023).
- Shapley/SHAP explanation: Orders-of-magnitude improved accuracy over marginal or Tree SHAP at acceptable logistic and runtime costs, due to faithful handling of conditional dependencies (Zhou et al., 2022).
- Optimization and robust decision rules: Robust surrogates yield up to improvement in worst-case cost at nominal loss in realistic budgeted-uncertainty settings (Goerigk et al., 2024).
- Preference learning: Tree surrogates in PBO match Gaussian Process regret on smooth objectives and dramatically outperform on spiky/discontinuous objectives, running faster (Leenders et al., 16 Dec 2025).
- Global sensitivity/uncertainty quantification: Tree-PCE surrogates enable accurate analytical Sobol’ index computation and novel TSE-gain sensitivity indices in highly irregular or discontinuous domains (Said et al., 16 Sep 2025).
7. Extensions, Limitations, and Future Directions
Recent research highlights several limitations and opportunities:
- Most MIP-based formulations face exponential scaling in tree depth and number of features, motivating the use of scalable heuristics, post-hoc rule clustering, or hybrid tree-ensemble distillation (Teodoro et al., 2023, Goerigk et al., 2024).
- Accuracy and stability may degrade when model complexity is trimmed too aggressively; thus, interpretability/fidelity trade-off selection is application- and stakeholder-dependent (Herbinger et al., 2023).
- Current frameworks mostly model cost or target uncertainty, with extension to constraint, parameter, or mixed uncertainties as an open area (Goerigk et al., 2024).
- Extrapolation properties of tree surrogates remain limited outside the convex hull of training data; hybrid models or injected stochasticity may be necessary for better uncertainty accounting (Kim et al., 2022).
- Embedding domain knowledge (e.g., via concept trees or meta-solution features) increases human comprehensibility, but further formalization of interpretability or transparency metrics remains an open challenge (Renard et al., 2019, Goerigk et al., 2024).
A plausible implication is that future advances in model-based tree surrogates will target richer local model classes, tighter integration with uncertainty quantification, scalable rule learning under complex constraints, and expanded applications in interactive optimization, simulation, and operational research.
References
- Zhou, W., et al., "Shapley Computations Using Surrogate Model-Based Trees" (Zhou et al., 2022)
- Herbinger, D., et al., "Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation" (Herbinger et al., 2023)
- Kim, H., and Choi, S., "On Uncertainty Estimation by Tree-based Surrogate Models in Sequential Model-based Optimization" (Kim et al., 2022)
- Leenders, N., et al., "Explainable Preference Learning: a Decision Tree-based Surrogate Model for Preferential Bayesian Optimization" (Leenders et al., 16 Dec 2025)
- Goerigk, M., et al., "Towards Robust Interpretable Surrogates for Optimization" (Goerigk et al., 2024)
- Di Teodoro, M., et al., "Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree" (Teodoro et al., 2023)
- Broelemann, K., and Kasneci, G., "A Gradient-Based Split Criterion for Highly Accurate and Transparent Model Trees" (Broelemann et al., 2018)
- Goerigk, M., et al., "Feature-Based Interpretable Surrogates for Optimization" (Goerigk et al., 2024)
- Benz, S., et al., "Concept Tree: High-Level Representation of Variables for More Interpretable Surrogate Decision Trees" (Renard et al., 2019)
- Vervliet, T., et al., "A tree-based Polynomial Chaos expansion for surrogate modeling and sensitivity analysis of complex numerical models" (Said et al., 16 Sep 2025)