Bagging is an Optimal PAC Learner (2212.02264v3)
Abstract: Determining the optimal sample complexity of PAC learning in the realizable setting was a central open problem in learning theory for decades. Finally, the seminal work by Hanneke (2016) gave an algorithm with a provably optimal sample complexity. His algorithm is based on a careful and structured sub-sampling of the training data and then returning a majority vote among hypotheses trained on each of the sub-samples. While being a very exciting theoretical result, it has not had much impact in practice, in part due to inefficiency, since it constructs a polynomial number of sub-samples of the training data, each of linear size. In this work, we prove the surprising result that the practical and classic heuristic bagging (a.k.a. bootstrap aggregation), due to Breiman (1996), is in fact also an optimal PAC learner. Bagging pre-dates Hanneke's algorithm by twenty years and is taught in most undergraduate machine learning courses. Moreover, we show that it only requires a logarithmic number of sub-samples to reach optimality.
- P. Auer and R. Ortner. A new pac bound for intersection-closed concept classes. In Learning Theory, pages 408–414. Springer Berlin Heidelberg, 2004.
- Boosting the margin: a new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651 – 1686, 1998.
- Learnability and the vapnik-chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989.
- L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
- L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
- A general lower bound on the number of examples needed for learning. Information and Computation, 82(3):247–261, 1989.
- Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
- W. Gao and Z.-H. Zhou. On the doubt about margin explanation of boosting. Artificial Intelligence, 203:1–18, 2013.
- S. Hanneke. The optimal sample complexity of pac learning. The Journal of Machine Learning Research, 17(1):1319–1333, 2016.
- M. Kearns. Learning boolean formulae or finite automata is as hard as factoring. Technical Report TR-14-88 Harvard University Aikem Computation Laboratory, 1988.
- M. Kearns and L. Valiant. Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM (JACM), 41(1):67–95, 1994.
- K. G. Larsen and M. Ritzert. Optimal weak to strong learning. Advances in Neural Information Processing Systems, 2022. To appear.
- Efficient margin maximizing with boosting. Journal of Machine Learning Research, 6(12), 2005.
- H. U. Simon. An almost optimal pac algorithm. In Conference on Learning Theory, pages 1552–1563. PMLR, 2015.
- L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
- V. Vapnik. Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics). Springer-Verlag, Berlin, Heidelberg, 1982.
- On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280, 1971.