Dice Question Streamline Icon: https://streamlinehq.com

Theory for bias reduction with correlated covariates

Develop a theoretical explanation for the significant reduction of prediction bias observed for both bagging and random forests when covariates are mutually correlated, independent of split randomization, and elucidate why averaging reduces bias in this regime contrary to the prevailing variance-reduction view.

Information Square Streamline Icon: https://streamlinehq.com

Background

Across multiple experiments, the authors find that when covariates are mutually correlated, both bagging and random forests exhibit substantially reduced bias, and that this bias reduction largely drives improvements in mean-squared error. This observation challenges the common view that ensemble averaging primarily works by variance reduction.

They explicitly state that better understanding why correlated covariates reduce bias—independent of whether split directions are randomized—is left for future research, highlighting a gap in theoretical understanding of tree ensemble behavior under dependence structures among predictors.

References

In particular, we find that, independently of randomization, bias is significantly reduced in the presence of correlated covariates. This finding goes beyond the prevailing view that averaging mostly works by variance reduction, and better understanding why this happens is something that we leave for future research.

When do Random Forests work? (2504.12860 - Revelas et al., 17 Apr 2025) in Section 5 (Conclusion)