Heterogeneous Random Forests

Updated 25 February 2026

Heterogeneous Random Forests are ensemble methods that incorporate diversity at multiple levels—including architecture, splitting criteria, and data modalities—to enhance prediction accuracy.
They utilize heterogeneous trees and node-splitting strategies to effectively handle complex, high-dimensional, and non-Euclidean data while maintaining robust statistical inference.
HRFs offer practical advantages in applications such as federated learning and causal inference by improving performance, reducing bias, and accommodating fragmented data.

A Heterogeneous Random Forest (HRF) is an extension of ensemble tree-based learning methods that incorporates diversity or heterogeneity at multiple levels—architecture, splitting criterion, data type, learning mechanism, or federated setting—to improve predictive power, enable consistent estimation under more general conditions, or provide computational benefits. The HRF paradigm generalizes classic Random Forests (RF) by allowing tree-level, node-level, metric, or splitting-rule heterogeneity, making it a versatile framework for modern statistical learning with heterogeneous, high-dimensional, or incrementally evolving data.

1. Foundational Concepts and Taxonomy

Heterogeneous Random Forests formalize several distinct but complementary notions of "heterogeneity" in tree ensembles:

Structural Heterogeneity: HRFs can incorporate trees of disparate types—conventional axis-aligned trees, oblique hyperplane splits (e.g., via multiple classifiers), or even trees operating on different input/output modalities or metric spaces (Ganaie et al., 2023, Capitaine et al., 2019, Sabzevari et al., 2018).
Split Criterion Heterogeneity: Some HRF variants use splits determined by advanced criteria tailored to the statistical goal—e.g., maximizing between-group conditional distributional discrepancies (MMD, Wasserstein) or treatment effect variance (Ćevid et al., 2020, Du et al., 2020, Wager et al., 2015, Zhang et al., 2 Jun 2025).
Diversity-Induced Heterogeneity: Certain HRFs inject targeted diversity into the set of tree structures by explicitly decorrelating features used for splits across trees or differentially weighting candidate features based on earlier usage (Kim et al., 2024).
Ensemble Pooling Heterogeneity: HRFs may blend trees with fundamentally different base learners, including SVMs and MLPs, thus aggregating across model classes to exploit complementary inductive biases (Sabzevari et al., 2018).
Federated and Data-heterogeneous HRFs: Recent HRF designs account for client-level statistical or feature heterogeneity, supporting non-identically distributed and partially observed features in distributed settings (Park et al., 2024, Khellaf et al., 3 Feb 2026).

This taxonomy reflects a spectrum from intra-node, through intra-tree, to ensemble/global levels of heterogeneity.

2. Metric-Space and Modality Heterogeneity: Fréchet Random Forests

Fréchet Random Forests generalize classical regression and classification trees to handle input and output data from arbitrary metric spaces, including curves, images, and non-Euclidean objects (Capitaine et al., 2019). In this construction:

Each predictor variable inhabits a metric space $(\mathcal{X}_j, d_j)$ , and the output variable lies in a response metric space $(\mathcal{Y}, d_{\mathcal{Y}})$ .
The empirical Fréchet mean in a node is the minimizer of

$\mathcal{F}_n(z) = \frac{1}{n} \sum_{i=1}^{n} d^2(z, z_i)$

over the output metric space.

Node splits are determined by maximizing the decrease in empirical Fréchet variance, with Voronoi-type partitions for non-Euclidean predictors.
Forest predictions aggregate tree-level predictions via the Fréchet mean over the forest, extending to OOB error and permutation importance natively in metric spaces.

These properties enable HRFs to fuse information across truly heterogeneous and high-dimensional data types, maintaining universal consistency under standard conditions (Capitaine et al., 2019).

3. Heterogeneity via Distributional and Treatment Effect Splits

Distributional Random Forests and related HRF variants generalize the tree-splitting logic to identify subpopulations with distinct conditional outcome distributions (Ćevid et al., 2020, Du et al., 2020, Wager et al., 2015, Zhang et al., 2 Jun 2025):

Distributional Random Forests (DRF): Each split maximizes the Maximum Mean Discrepancy (MMD) between the distributions of the response variable in the left/right child nodes, adapting local weights so leaves capture conditional distribution homogeneity rather than conditional mean homogeneity (Ćevid et al., 2020).
Wasserstein Random Forests (WRF): Splits are chosen to maximize the Wasserstein distance between empirical distributions of outcomes in children, directly targeting richer conditional law estimation (Du et al., 2020).
Causal/Heterogeneous Treatment Effect Forests: The splitting criterion in each tree maximizes the degree of treatment effect heterogeneity (conditional variance of treatment effect estimator across splits), with theory supporting pointwise, asymptotically Gaussian, and centered sampling distributions for CATE estimates (Wager et al., 2015, Dandl et al., 2022). Proper "honesty" (sample splitting for splits vs. estimation) is essential for valid inference.
Model Performance Heterogeneity Forests: Splits maximize heterogeneity in model performance (e.g., AUC, MSE) across groups, providing subgroup-adaptive model assessment or selection (Zhang et al., 2 Jun 2025).

Compared to classic RF/CART, these HRFs are specifically designed to detect and exploit covariate-induced heterogeneity in more general target functionals beyond mean regression.

4. Structural and Architectural Heterogeneity

Structural HRFs introduce diversity by explicitly mixing trees or splits across model types or learning algorithms:

Oblique and Multi-classifier Node Splits: At each node, multiple linear classifiers (e.g., SVMs, LDA, logistic regression) are trained, and the best among them (by impurity) defines the node split, enabling oblique partition boundaries capturing complex geometries (Ganaie et al., 2023).
Ensemble Fusion of Heterogeneous Base Learners: HRFs can be constructed as composites of standard random forests, SVM ensembles, and neural network ensembles, with predictions averaged across a simplex-defined mixture (Sabzevari et al., 2018). The optimal mixture is selected by cross-validation or out-of-bag error minimization over the simplex grid.

Empirical findings indicate that properly balanced heterogeneous ensembles generally outperform their homogeneous counterparts due to the aggregation of complementary biases and variance reduction.

5. Diversity-Induced Heterogeneity for Enhanced Ensemble Performance

HRFs may induce diversity by penalizing previously dominant features in the tree-construction process:

Feature Usage Penalization: For each tree $b$ , features appearing near the root are assigned low weights in the sampling distribution for the next tree, thus discouraging repeated use of dominant features and promoting fairer feature exposure (Kim et al., 2024).
Weight Update Mechanism:

$D_{b,j} = d^{\beta}_{b,j} + \alpha D_{b-1,j}, \quad w_{b+1, j} = \frac{D_{b, j}}{\sum_{k=1}^p D_{b,k}}$

where $d^{\beta}_{b,j}$ penalizes shallow feature usage (large penalty if unused), and $\alpha$ controls memory.

Simulations and benchmarks show that these HRFs reduce selection bias, increase tree-to-tree diversity (as measured by statistical dissimilarity metrics), and improve classification accuracy in datasets with low to moderate feature noise.

6. Heterogeneous Random Forests in Federated and Incomplete Data Settings

Recent advances extend HRF to federated learning and partial-feature scenarios (Park et al., 2024, Khellaf et al., 3 Feb 2026):

Partially Overlapping Feature Spaces: In horizontal federations of multiple sites, each with different observed features, local random forests are trained and the resulting trees are pooled centrally. Each site only reuses compatible trees (those built on subsets of locally available features) to form their final model, yielding improved AUC over purely local forests provided sufficient feature overlap (Park et al., 2024).
Principled Federated RF (FedForest): For horizontally partitioned, statistically non-identical clients, FedForest aggregates node-level sufficient statistics and quantile sketches across clients to exactly reproduce central split decisions under any mixture of covariate or outcome shift. Precise theoretical guarantees ensure the central pooled impurity criterion is optimized without requiring data sharing or exact feature overlap (Khellaf et al., 3 Feb 2026).

These frameworks enable scalable HRF deployment even when data fragmentation or privacy limits classical RF applicability.

7. Empirical Performance, Consistency, and Theoretical Guarantees

Across empirical studies, HRFs have been shown to match or exceed the performance of their homogeneous counterparts, with major advantages including:

Lower MSEs in curve regression, imaging, and high-dimensional tasks (Capitaine et al., 2019).
Accurate and valid estimation of heterogeneous treatment effects and conditional distributions (Wager et al., 2015, Ćevid et al., 2020, Du et al., 2020).
Robustness to missing data, time perturbations, and partial feature overlap in real-world and simulated scenarios (Capitaine et al., 2019, Park et al., 2024).
Near-offline accuracy in incremental learning with substantially lower computation (Xie et al., 2016).
Statistically valid confidence intervals and consistency under mild regularity conditions for treatment effect and distributional estimation tasks (Wager et al., 2015, Ćevid et al., 2020).

Theoretical results include pointwise consistency for Fréchet and distributional forests, asymptotic normality for causal forests, and L $^2$ -consistency for model performance forest estimators, subject to the standard assumptions of RF (honesty, sufficient subsample size, shrinking diameters, etc.).

In summary, Heterogeneous Random Forests constitute a broad methodological paradigm encompassing architectural, distributional, and statistical strategies to address the modeling of complex data with high degrees of heterogeneity in predictors, outputs, mechanisms, or context. The HRF framework provides a foundation for adaptive, robust, and theoretically grounded learning across diverse modern applications in statistics and machine learning.