Majority-of-Three: The Simplest Optimal Learner? (2403.08831v1)
Abstract: Developing an optimal PAC learning algorithm in the realizable setting, where empirical risk minimization (ERM) is suboptimal, was a major open problem in learning theory for decades. The problem was finally resolved by Hanneke a few years ago. Unfortunately, Hanneke's algorithm is quite complex as it returns the majority vote of many ERM classifiers that are trained on carefully selected subsets of the data. It is thus a natural goal to determine the simplest algorithm that is optimal. In this work we study the arguably simplest algorithm that could be optimal: returning the majority vote of three ERM classifiers. We show that this algorithm achieves the optimal in-expectation bound on its error which is provably unattainable by a single ERM classifier. Furthermore, we prove a near-optimal high-probability bound on this algorithm's error. We conjecture that a better analysis will prove that this algorithm is in fact optimal in the high-probability regime.
- The one-inclusion graph algorithm is not always optimal. In The Thirty Sixth Annual Conference on Learning Theory, COLT 2023, volume 195 of Proceedings of Machine Learning Research, pages 72–88. PMLR, 2023.
- Optimal PAC bounds without uniform convergence. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 1203–1223. IEEE Computer Society, 2023.
- A new PAC bound for intersection-closed concept classes. Machine Learning, 66(2):151–163, 2007.
- Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929–965, 1989.
- Proper learning, Helly number, and an optimal SVM bound. In Conference on Learning Theory, pages 582–609. PMLR, 2020.
- Leo Breiman. Bagging predictors. Mach. Learn., 24(2):123–140, aug 1996.
- Malte Darnstädt. The optimal PAC bound for intersection-closed concept classes. Information Processing Letters, 115(4):458–461, 2015.
- Steve Hanneke. Theoretical Foundations of Active Learning. Doctoral thesis, Carnegie-Mellon University, Machine Learning Department, 2009.
- Steve Hanneke. The optimal sample complexity of PAC learning. The Journal of Machine Learning Research, 17(1):1319–1333, 2016.
- Steve Hanneke. Refined error bounds for several learning algorithms. The Journal of Machine Learning Research, 17(1):4667–4721, 2016.
- Predicting {{\{{0, 1}}\}}-functions on randomly drawn points. Information and Computation, 115(2):248–292, 1994.
- Svante Janson. Tail bounds for sums of geometric and exponential variables. Statistics and Probability Letters, 135:1–6, 2018.
- Kasper Green Larsen. Bagging is an optimal PAC learner. In The Thirty Sixth Annual Conference on Learning Theory, COLT 2023, volume 195 of Proceedings of Machine Learning Research, pages 450–468. PMLR, 2023.
- The one-inclusion graph algorithm is near-optimal for the prediction model of learning. IEEE Transactions on Information Theory, 47(3):1257–1261, 2001.
- Robert E. Schapire. The Design and Analysis of Efficient Learning Algorithms. ACM Doctoral Dissertation Awards. The MIT Press, 1992.
- Hans U Simon. An almost optimal PAC algorithm. In Conference on Learning Theory, pages 1552–1563. PMLR, 2015.
- Leslie G Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
- A class of algorithms for pattern recognition learning. Avtomatika i Telemekhanika, 25(6):937–945, 1964.
- Algorithms with complete memory and recurrent algorithms in the problem of learning pattern recognition. Avtomatika i Telemekhanika, pages 95–106, 1968.
- On uniform convergence of the frequencies of events to their probabilities. Teoriya Veroyatnostei i ee Primeneniya, 16(2):264–279, 1971.
- Theory of Pattern Recognition. Nauka, Moscow, 1974.
- Manfred K Warmuth. The optimal PAC algorithm. In International Conference on Computational Learning Theory, pages 641–642. Springer, 2004.