Extra Trees Regressor (ETR)

Updated 18 May 2026

Extra Trees Regressor (ETR) is a highly randomized ensemble method that builds decision trees with uniformly random split thresholds to reduce variance and enhance predictive diversity.
The model avoids bootstrapping by using the full dataset for each tree and employs aggressive randomization of candidate features and thresholds to achieve fast and robust regression performance.
ETR demonstrates strong predictive performance in materials science applications, with out-of-the-box metrics (R² ≈ 0.9994) and efficient high-throughput screening through ten-fold cross-validation.

The Extra Trees Regressor (ETR), also known as “Extremely Randomized Trees,” is an ensemble machine learning model consisting of decision trees with maximally randomized split selection at both the feature and threshold levels. ETR has demonstrated efficacy in regression tasks that require both high predictive accuracy and robust generalization, particularly in physical sciences for properties prediction directly from high-dimensional descriptors. Its defining attributes—extensive tree decorrelation through randomization and the absence of bootstrapped samples—render ETR a fast, low-variance, and high-performance estimator for applications in computational materials science and related high-throughput screening tasks (Paliwal et al., 17 Nov 2025).

1. Algorithmic Structure and Randomization Principles

ETR constructs an ensemble of individual decision trees, each built from the full training dataset without resampling. At each tree node, a subset of candidate features ( $m$ features, where $m$ is determined by the model’s max_features hyperparameter) is randomly selected. For each candidate feature within this subset, instead of greedily testing all possible split thresholds as in standard Random Forests (RF), ETR samples a split threshold uniformly at random from the interval defined by the feature’s observed minimum and maximum values in the current node’s sample.

Among all $m$ random splits, the split associated with the maximum impurity decrease, typically quantified as mean squared error (MSE) for regression, is chosen. Consequently, predictive diversity among individual trees is amplified, reducing ensemble variance. The principal distinctions from RF are: (1) ETR eschews data bootstrapping (each tree sees all data), (2) split thresholds are sampled randomly instead of optimally selected, and (3) extensive ensemble averaging is employed.

2. Model Hyperparameters and Default Setting Rationale

In the cited application, ETR was instantiated via the scikit-learn (v0.24) ExtraTreesRegressor class, with unmodified defaults reflecting standard settings. The key hyperparameters were:

Hyperparameter	Value	Effect
n_estimators	100	Number of trees in ensemble
criterion	"mse"	Split impurity metric
max_depth	None	Unlimited depth (pure leaves)
min_samples_split	2	Node split minimum sample count
min_samples_leaf	1	Leaf node minimum sample count
max_features	"auto"	$\sqrt{p}$ , $p$ = feature count
bootstrap	False	Full data for each tree
random_state	None	No fixed seed

No additional hyperparameter tuning was performed due to both strong intrinsic model randomization and the presence of significant regression signal in the selected data (Paliwal et al., 17 Nov 2025). This suggests robustness of ETR to hyperparameter specification within the tested context.

3. Feature Engineering and Data Preparation

The regression dataset comprised 4,127 sample points across 150 crystalline compounds, each accompanied by temperature-dependent ab initio (DFT-computed) lattice thermal conductivity, $\kappa_L(T)$ , spanning $100\text{–}1000$ K. The regression target was the base-10 logarithm of $\kappa_L$ to address multi-order-of-magnitude variability and skewness, i.e.,

$y_i = \log_{10}\bigl(\kappa_{L,i}\bigr)$

Features were generated by the MAGPIE framework to encapsulate both compositional and crystal-structural information, such as atomic weights, electronegativities, coordination numbers, and other elemental descriptors. Initial feature dimensionality was $p_0=271$ , with temperature as the 272nd input. Dimensionality reduction proceeded via: (1) a variance threshold ( $m$ 0 0.16) eliminating 64 features, and (2) pairwise Pearson correlation filtering ( $m$ 1), yielding a final feature matrix of $m$ 2 descriptors. Standard supervised learning splits were applied (80% training, 20% test), and 12 compounds were held entirely out to assess model transferability to unseen chemistries.

4. Model Training, Cross-Validation, and Computational Scaling

Ten-fold repeated cross-validation was conducted with each run comprising random shuffling, 80/20 train-test splits, model fitting, and prediction. Each fold thus trained on approximately 3,288 samples and tested on the remaining 822. For each partition, the following train-test steps were executed:

Model instantiation: ExtraTreesRegressor(...) with defaults
Fit: model.fit(X_\text{train}, y_\text{train})
Predict: $m$ 3
Record performance metrics (RMSE, $m$ 4, MAE)

Each 10-repetition CV run required approximately 3.33 minutes wall-time on a multi-core NVIDIA DGX-3 platform leveraging scikit-learn’s parallel computation capabilities.

5. Predictive Performance Evaluation

All performance metrics were calculated on the logarithmic regression target, $m$ 5. Definitions are:

Root Mean Square Error (RMSE):

$m$ 6

Coefficient of Determination ( $m$ 7):

$m$ 8

Mean Absolute Error (MAE):

$m$ 9

Average test set scores (across ten repetitions):

$m$ 0
$m$ 1 (in $m$ 2)
$m$ 3

For held-out generalization (12 compounds), $m$ 4 over predictions spanning $m$ 5 K. These results demonstrate DFT-level regression fidelity and generalizability to high- and low-symmetry materials.

6. Model Characteristics Underlying ETR’s Superior Performance

ETR outperformed alternative regressors in this application due to multiple structural and statistical properties:

Enhanced Tree Decorrelation: Random thresholds at each split decorrelate trees within the ensemble more than the greedy-split paradigm of RF, decreasing overall variance.
Negligible Bias Increase: The use of 100 trees ensures that, on average, splits closely approximate optimal positioning despite randomness.
Minimal Tuning Required: Aggressive randomization and ensemble averaging result in strong out-of-the-box performance even without hyperparameter optimization, in contrast with gradient boosting or AdaBoost variants.
Feature Attribution Analysis: SHAP value investigations identified temperature, mean number of unfilled $m$ 6 electrons, minimum unfilled orbitals, and minimum atomic volume as primary drivers of $m$ 7 variability. Low temperature and small, electrostatically simple atoms increase $m$ 8 (positive SHAP influence), whereas high mass contrast, greater coordination, and additional open $m$ 9/ $\sqrt{p}$ 0 orbitals suppress $\sqrt{p}$ 1 (negative SHAP contribution).

7. High-Throughput Screening Workflow Implementation

The high-throughput screening protocol for $\sqrt{p}$ 2 operates via a reproducible pipeline:

$\sqrt{p}$ 3

This procedure, including variance and correlation-based feature pruning, default ETR fitting, and millisecond-scale inference, enabled rapid screening of 960 half-Heusler candidates and 60,000 ICSD structures for promising thermoelectric compounds. The performance and transferability achieved with this workflow validate ETR as a reliable estimator for high-dimensional, physics-constrained regression in materials informatics (Paliwal et al., 17 Nov 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Accelerated Prediction of Temperature-Dependent Lattice Thermal Conductivity via Ensembled Machine Learning Models (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Extra Trees Regressor (ETR).