Extra Trees Regressor (ETR)
- Extra Trees Regressor (ETR) is a highly randomized ensemble method that builds decision trees with uniformly random split thresholds to reduce variance and enhance predictive diversity.
- The model avoids bootstrapping by using the full dataset for each tree and employs aggressive randomization of candidate features and thresholds to achieve fast and robust regression performance.
- ETR demonstrates strong predictive performance in materials science applications, with out-of-the-box metrics (R² ≈ 0.9994) and efficient high-throughput screening through ten-fold cross-validation.
The Extra Trees Regressor (ETR), also known as “Extremely Randomized Trees,” is an ensemble machine learning model consisting of decision trees with maximally randomized split selection at both the feature and threshold levels. ETR has demonstrated efficacy in regression tasks that require both high predictive accuracy and robust generalization, particularly in physical sciences for properties prediction directly from high-dimensional descriptors. Its defining attributes—extensive tree decorrelation through randomization and the absence of bootstrapped samples—render ETR a fast, low-variance, and high-performance estimator for applications in computational materials science and related high-throughput screening tasks (Paliwal et al., 17 Nov 2025).
1. Algorithmic Structure and Randomization Principles
ETR constructs an ensemble of individual decision trees, each built from the full training dataset without resampling. At each tree node, a subset of candidate features ( features, where is determined by the model’s max_features hyperparameter) is randomly selected. For each candidate feature within this subset, instead of greedily testing all possible split thresholds as in standard Random Forests (RF), ETR samples a split threshold uniformly at random from the interval defined by the feature’s observed minimum and maximum values in the current node’s sample.
Among all random splits, the split associated with the maximum impurity decrease, typically quantified as mean squared error (MSE) for regression, is chosen. Consequently, predictive diversity among individual trees is amplified, reducing ensemble variance. The principal distinctions from RF are: (1) ETR eschews data bootstrapping (each tree sees all data), (2) split thresholds are sampled randomly instead of optimally selected, and (3) extensive ensemble averaging is employed.
2. Model Hyperparameters and Default Setting Rationale
In the cited application, ETR was instantiated via the scikit-learn (v0.24) ExtraTreesRegressor class, with unmodified defaults reflecting standard settings. The key hyperparameters were:
| Hyperparameter | Value | Effect |
|---|---|---|
| n_estimators | 100 | Number of trees in ensemble |
| criterion | "mse" | Split impurity metric |
| max_depth | None | Unlimited depth (pure leaves) |
| min_samples_split | 2 | Node split minimum sample count |
| min_samples_leaf | 1 | Leaf node minimum sample count |
| max_features | "auto" | , = feature count |
| bootstrap | False | Full data for each tree |
| random_state | None | No fixed seed |
No additional hyperparameter tuning was performed due to both strong intrinsic model randomization and the presence of significant regression signal in the selected data (Paliwal et al., 17 Nov 2025). This suggests robustness of ETR to hyperparameter specification within the tested context.
3. Feature Engineering and Data Preparation
The regression dataset comprised 4,127 sample points across 150 crystalline compounds, each accompanied by temperature-dependent ab initio (DFT-computed) lattice thermal conductivity, , spanning K. The regression target was the base-10 logarithm of to address multi-order-of-magnitude variability and skewness, i.e.,
Features were generated by the MAGPIE framework to encapsulate both compositional and crystal-structural information, such as atomic weights, electronegativities, coordination numbers, and other elemental descriptors. Initial feature dimensionality was , with temperature as the 272nd input. Dimensionality reduction proceeded via: (1) a variance threshold (0 0.16) eliminating 64 features, and (2) pairwise Pearson correlation filtering (1), yielding a final feature matrix of 2 descriptors. Standard supervised learning splits were applied (80% training, 20% test), and 12 compounds were held entirely out to assess model transferability to unseen chemistries.
4. Model Training, Cross-Validation, and Computational Scaling
Ten-fold repeated cross-validation was conducted with each run comprising random shuffling, 80/20 train-test splits, model fitting, and prediction. Each fold thus trained on approximately 3,288 samples and tested on the remaining 822. For each partition, the following train-test steps were executed:
- Model instantiation:
ExtraTreesRegressor(...)with defaults - Fit:
model.fit(X_\text{train}, y_\text{train}) - Predict: 3
- Record performance metrics (RMSE, 4, MAE)
Each 10-repetition CV run required approximately 3.33 minutes wall-time on a multi-core NVIDIA DGX-3 platform leveraging scikit-learn’s parallel computation capabilities.
5. Predictive Performance Evaluation
All performance metrics were calculated on the logarithmic regression target, 5. Definitions are:
- Root Mean Square Error (RMSE):
6
- Coefficient of Determination (7):
8
- Mean Absolute Error (MAE):
9
Average test set scores (across ten repetitions):
- 0
- 1 (in 2)
- 3
For held-out generalization (12 compounds), 4 over predictions spanning 5 K. These results demonstrate DFT-level regression fidelity and generalizability to high- and low-symmetry materials.
6. Model Characteristics Underlying ETR’s Superior Performance
ETR outperformed alternative regressors in this application due to multiple structural and statistical properties:
- Enhanced Tree Decorrelation: Random thresholds at each split decorrelate trees within the ensemble more than the greedy-split paradigm of RF, decreasing overall variance.
- Negligible Bias Increase: The use of 100 trees ensures that, on average, splits closely approximate optimal positioning despite randomness.
- Minimal Tuning Required: Aggressive randomization and ensemble averaging result in strong out-of-the-box performance even without hyperparameter optimization, in contrast with gradient boosting or AdaBoost variants.
- Feature Attribution Analysis: SHAP value investigations identified temperature, mean number of unfilled 6 electrons, minimum unfilled orbitals, and minimum atomic volume as primary drivers of 7 variability. Low temperature and small, electrostatically simple atoms increase 8 (positive SHAP influence), whereas high mass contrast, greater coordination, and additional open 9/0 orbitals suppress 1 (negative SHAP contribution).
7. High-Throughput Screening Workflow Implementation
The high-throughput screening protocol for 2 operates via a reproducible pipeline:
3
This procedure, including variance and correlation-based feature pruning, default ETR fitting, and millisecond-scale inference, enabled rapid screening of 960 half-Heusler candidates and 60,000 ICSD structures for promising thermoelectric compounds. The performance and transferability achieved with this workflow validate ETR as a reliable estimator for high-dimensional, physics-constrained regression in materials informatics (Paliwal et al., 17 Nov 2025).