Lasso-Weighted Random Forests
- Lasso-weighted random forests are an ensemble method that combines equal-weight random forest averaging with adaptive Lasso penalties to balance bias and variance.
- The technique adapts weights to interpolate between aggressive sparsification and uniform aggregation, excelling particularly under moderate signal-to-noise conditions.
- Empirical benchmarks demonstrate its versatility across domains, achieving up to 30% improvement in mean squared error and sharper feature selection.
Lasso-weighted random forests—also termed "Lassoed Forests"—are an ensemble learning methodology combining the variance-reducing power of random forests with bias-reduction via sparse convex post-selection, controlled by an adaptive weighted Lasso penalty. This approach seeks to interpolate between the traditional random forest, which uniformly averages an ensemble of high-variance but low-bias regression trees, and post-selection Lasso reweighting, which can aggressively discount weak trees to reduce bias but risks increased variance, especially in low signal-to-noise regimes. By introducing adaptivity in the regularization penalty, Lassoed Forests provide a principled, unified framework that strictly outperforms both standard random forest and fixed-weight Lasso post-selection under moderate signal-to-noise conditions, as established both theoretically and empirically (Shang et al., 10 Nov 2025).
1. Problem Formulation and Estimators
The standard random forest regression model for a feature vector with trees predicts via simple averaging: which is equivalently with . While this aggregation reduces variance among trees, it cannot eliminate the bias shared by all .
To reduce estimator bias, a fixed-weight Lasso post-selection forms the prediction as a sparse linear combination of trees, selecting weights by solving
yielding , where contains out-of-bag predictions . However, the penalty risks discarding too many trees or over-shrinking weights, which can degrade accuracy by raising variance when signal-to-noise ratio (SNR) is low.
The Lassoed Forest introduces an adaptive Lasso penalty to interpolate between equal weighting () and aggressive sparsification. Adaptive weights are defined for an initial estimate as for , leading to the objective:
2. Optimization, Algorithms, and KKT Analysis
At optimality, the sub-differential Karush-Kuhn-Tucker (KKT) conditions for the adaptive Lasso are: This implies .
Efficient optimization proceeds via cyclic coordinate descent. For each , all () are fixed and
where is soft-thresholding. Strong screening rules (e.g., discarding where is too small) can dramatically reduce computation.
3. Theoretical Properties: Bias-Variance, Oracle Bounds, and SNR
Assuming the true model with sub-Gaussian and that satisfies a restricted eigenvalue (RE) condition, the analysis focuses on mean squared prediction error. The SNR is defined as
For the adaptive Lasso, oracle inequalities show that, with high probability,
for , , and constants depending on the RE constant .
The bias-variance decomposition is
Vanilla RF typically exhibits large bias but variance decays as . Fixed Lasso reduces bias but risks high variance at low SNR. Adaptive Lasso, by tuning , yields an estimator whose upper bound on MSE strictly interpolates between—and can be smaller than—those of both alternatives for moderate SNR, as formalized:
4. Empirical Evaluation and Benchmarking
Simulation experiments considered polynomial and tree-ensemble generative models. Test metrics included MSE, bias/variance decomposition over replicates, and OOB/CV error estimation. Hyperparameters and were selected via cross-validation with grid search over .
Empirical findings included:
- Low SNR regime: vanilla RF outperforms fixed Lasso post-selection
- High SNR: Lasso post-selection outperforms vanilla RF by up to 20 %
- Adaptive Lassoed Forest tracks the better of the two uniformly, providing up to 30 % improvement over the less suitable method at moderate SNR
Variable importance was quantified using weighted split counts: with adaptively weighted forests producing sharper separation of true signal features.
In real-world case studies:
| Domain & Adaptive Forest Performance | Notes |
|---|---|
| California Housing | Recovers most post-selection gain (5–10 %) without loss at low SNR |
| Spam classification | Maximum loss vs. best baseline is ≤1% error |
| Drug response prediction | Lower MSE than RF and Lasso on 5/6 drugs |
| Survival / Binary clinical | Higher CDI c-index / lower misclassification |
5. Practical Implementation Procedures
Lassoed Forests are trained using a cross-fitted workflow to prevent over-optimistic bias in out-of-bag or cross-validation error estimates. The high-level procedure is as follows:
- Split dataset into disjoint halves .
- On , grow trees via bootstrap; obtain OOB predictions on to form matrix .
- Fit a fixed-weight Lasso on to generate , selecting the regularization by cross-validation.
- For grid of candidate , set adaptive weights , fit adaptive Lasso, and estimate cross-validated error.
- Select minimizing CV error, with weights at this solution.
- The final prediction is .
Computational considerations:
- Tree fitting scales as .
- Lasso regression via coordinate descent is per cross-validation fold for -values.
- Feature screening and warm starting can reduce effective complexity to even for large (up to ).
- In R,
glmnetwith per-variable penalties implements adaptive Lasso; in Python, the trick is to usesklearn.linear_model.Lassoand encode viasample_weight. - For very large , sparsity can be exploited by representing by only nonzero OOB entries. Parallel coordinate descent and warm starts accelerate the adaptive Lasso solution path.
6. Interpretive Perspectives and Methodological Significance
The Lassoed Forest framework synthesizes the strengths and mitigates the core weaknesses of both bagging and convex post-selection in tree ensembles. Explicit dependence on SNR determines which regime—the high-variance, low-bias averaging or low-variance, high-bias selection—dominates performance. The adaptive penalty yields a smooth transition, and mathematical guarantees under standard RE and sub-Gaussian assumptions establish strict improvement in predictive risk for moderate SNR. This suggests Lassoed Forests are most beneficial when the true function's variance and noise levels are comparable, and in settings requiring both feature selection interpretability and robust out-of-sample prediction.
The use of weighted split counts for variable importance, conditional on forest weights, provides a tool for causal and feature attribution analyses with enhanced separation of signal features. The modular nature of the post-selection procedure also allows for deployment with alternative ensemble architectures and further regularization frameworks.
A plausible implication is that the Lassoed Forest methodology motivates further exploration of adaptive penalties and staged model selection in nonparametric ensemble learning, especially as model sizes and data scales continue to increase.