Papers
Topics
Authors
Recent
2000 character limit reached

To Bag is to Prune (2008.07063v5)

Published 17 Aug 2020 in stat.ML, cs.LG, and econ.EM

Abstract: It is notoriously difficult to build a bad Random Forest (RF). Concurrently, RF blatantly overfits in-sample without any apparent consequence out-of-sample. Standard arguments, like the classic bias-variance trade-off or double descent, cannot rationalize this paradox. I propose a new explanation: bootstrap aggregation and model perturbation as implemented by RF automatically prune a latent "true" tree. More generally, randomized ensembles of greedily optimized learners implicitly perform optimal early stopping out-of-sample. So there is no need to tune the stopping point. By construction, novel variants of Boosting and MARS are also eligible for automatic tuning. I empirically demonstrate the property, with simulated and real data, by reporting that these new completely overfitting ensembles perform similarly to their tuned counterparts -- or better.

Citations (9)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.