2000 character limit reached
Grafting: Making Random Forests Consistent (2403.06015v1)
Published 9 Mar 2024 in stat.ML and cs.LG
Abstract: Despite their performance and widespread use, little is known about the theory of Random Forests. A major unanswered question is whether, or when, the Random Forest algorithm is consistent. The literature explores various variants of the classic Random Forest algorithm to address this question and known short-comings of the method. This paper is a contribution to this literature. Specifically, the suitability of grafting consistent estimators onto a shallow CART is explored. It is shown that this approach has a consistency guarantee and performs well in empirical settings.
- Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences of the United States of America, 113(27):7353–7360, July 2016.
- Confidence sets for split points in decision trees. The Annals of Statistics, 35(2), April 2007. arXiv:0708.1820 [math, stat].
- Iterative random forests to discover predictive and stable high-order interactions. Proceedings of the National Academy of Sciences, 115(8):1943–1948, February 2018. Publisher: Proceedings of the National Academy of Sciences.
- Gerard Biau. Analysis of a Random Forests Model. April 2012.
- Consistency of Random Forests and Other Averaging Classifiers. Journal of Machine Learning Research, 9(66):2015–2033, 2008.
- A random forest guided tour. TEST, 25(2):197–227, June 2016.
- Neural Random Forests, April 2018. arXiv:1604.07143 [cs, math, stat].
- Leo Breiman. Bias, Variance, and Arcing Classifiers. 1996.
- Leo Breiman. Random Forests. Machine Learning, 45(1):5–32, October 2001.
- Leo Breiman. Consistency for a simple model of random forests. 2004. Publisher: Citeseer.
- Classification and Regression Trees. CRC Press LLC, Boca Raton, UNITED STATES, 1984.
- Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68, February 2018.
- Impact of subsampling and tree depth on random forests. ESAIM: Probability and Statistics, 22:96–128, 2018. Publisher: EDP Sciences.
- Variable selection using Random Forests. Pattern Recognition Letters, 31(14):2225–2236, October 2010. Publisher: Elsevier.
- Random Forests. In Trevor Hastie, Robert Tibshirani, and Jerome Friedman, editors, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Series in Statistics, pages 587–604. Springer, New York, NY, 2009.
- The Two Most Important Algorithms in Predictive Modeling Today. Strata Conference Presentation, New York, February 28 2012. - References - Scientific Research Publishing, 2012.
- Hemant Ishwaran. The effect of splitting on random forests. Machine Learning, 99(1):75–118, April 2015.
- Jason Klusowski. Sparse Learning with CART. In Advances in Neural Information Processing Systems, volume 33, pages 11612–11622. Curran Associates, Inc., 2020.
- Jason M. Klusowski. Universal Consistency of Decision Trees in High Dimensions, June 2021. arXiv:2104.13881 [cs, math, stat].
- Yi Lin and Yongho Jeon. Random Forests and Adaptive Nearest Neighbors. Journal of the American Statistical Association, 101(474):578–590, June 2006. Publisher: Taylor & Francis _eprint: https://doi.org/10.1198/016214505000001230.
- Using random forest to identify longitudinal predictors of health in a 30-year cohort study. Scientific Reports, 12(1):10372, June 2022. Number: 1 Publisher: Nature Publishing Group.
- Predicting United States Policy Outcomes with Random Forests, 2020.
- Minimax optimal rates for Mondrian trees and forests, April 2019. arXiv:1803.05784 [math, stat].
- Erwan Scornet. Learning with random forests. PhD thesis, Université Pierre et Marie Curie - Paris VI, November 2015.
- Erwan Scornet. On the asymptotics of random forests. Journal of Multivariate Analysis, 146:72–83, April 2016.
- Erwan Scornet. Random Forests and Kernel Methods. IEEE Transactions on Information Theory, 62(3):1485–1500, March 2016. Conference Name: IEEE Transactions on Information Theory.
- Consistency of random forests. The Annals of Statistics, 43(4):1716–1741, August 2015. Publisher: Institute of Mathematical Statistics.
- P. J. Green Silverman, Bernard W. Nonparametric Regression and Generalized Linear Models: A roughness penalty approach. Chapman and Hall/CRC, New York, May 1993.
- Doubly penalized estimation in additive regression with high-dimensional data. The Annals of Statistics, 47(5):2567–2600, October 2019. Publisher: Institute of Mathematical Statistics.
- Predicting Re-Employment: Machine Learning Versus Assessments by Unemployed Workers and by Their Caseworkers. SSRN Electronic Journal, 2023.
- Hal R. Varian. Big Data: New Tricks for Econometrics. Journal of Economic Perspectives, 28(2):3–28, May 2014.
- Zhi-Hua Zhou and Ji Feng. Deep Forest, July 2020. arXiv:1702.08835 [cs, stat].