Majority-of-Three: The Simplest Optimal Learner? (2403.08831v1)

Published 12 Mar 2024 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: Developing an optimal PAC learning algorithm in the realizable setting, where empirical risk minimization (ERM) is suboptimal, was a major open problem in learning theory for decades. The problem was finally resolved by Hanneke a few years ago. Unfortunately, Hanneke's algorithm is quite complex as it returns the majority vote of many ERM classifiers that are trained on carefully selected subsets of the data. It is thus a natural goal to determine the simplest algorithm that is optimal. In this work we study the arguably simplest algorithm that could be optimal: returning the majority vote of three ERM classifiers. We show that this algorithm achieves the optimal in-expectation bound on its error which is provably unattainable by a single ERM classifier. Furthermore, we prove a near-optimal high-probability bound on this algorithm's error. We conjecture that a better analysis will prove that this algorithm is in fact optimal in the high-probability regime.

References (22)

Citations (3)

View on Semantic Scholar

Summary

The paper proves that the Majority-of-Three learner achieves an in-expectation error bound of O(d/n), matching known optimal predictors in the PAC framework.
The analysis establishes a high-probability bound of O((d/n)loglog(min{n/d,1/δ})+(1/n)log(1/δ)), with only a slight sub-optimality due to a loglog factor.
The results highlight that simple ensemble methods can nearly attain optimal generalization, spurring future research on tighter analyses and broader learning settings.

High-Probability and Expected Generalization Bounds for Majority-of-Three Learners

Introduction

This paper addresses the effectiveness of an extremely simple yet surprisingly powerful learning algorithm within the framework of Probably Approximately Correct (PAC) learning. This learning algorithm, dubbed Majority-of-Three, is based on taking the majority vote over three Empirical Risk Minimization (ERM) predictors, each trained on disjoint subsets of the training data. The paper's main contributions can be segmented into establishing both expected and high-probability upper bounds on the generalization error of the Majority-of-Three learner, thereby pushing the envelope of our understanding of the simplest forms of optimal learners within the realizable PAC setting.

Expected Generalization Bounds for Majority-of-Three

The first key result of the paper shows that Majority-of-Three achieves an optimal in-expectation generalization bound under the realizable PAC setting. Specifically, it is proven that:

Theorem 1: For a function class $\mc{F}$ with VC dimension $d$ , distribution $P$ , and target function $f^\star \in \mc{F}$, the in-expectation error of the Majority-of-Three learner is bounded above by $O(d/n)$ , where $n$ is the training sample size.

This result is significant as it demonstrates that the generalization error of Majority-of-Three, in expectation, matches that of the one-inclusion graph predictor, which is known to be optimal in this metric. The analysis builds on the notion of partitioning the input space into regions based on the probability of a single ERM learner erring on each point, and subsequently using a series of careful probabilistic arguments to bound the expected error over these regions.

High-Probability Generalization Bounds for Majority-of-Three

The paper further extends the analysis of Majority-of-Three to the high-probability regime, providing a near-optimal high-probability upper bound on its generalization error. The established bound can be summarized as follows:

Theorem 2: With probability at least $1-\delta$ over the sampling of training data, the generalization error of Majority-of-Three is bounded above by $O\left((d/n)\log\log(\min\{n/d,1/\delta\})+(1/n)\log(1/\delta)\right)$ .

Although this bound introduces a $\log\log$ factor making it slightly sub-optimal compared to the known general lower bound for improper learners in the PAC model, it remains noteworthy owing to its optimality for a significant range of the parameter $\delta$ , especially in the low $\delta$ regime.

Implications and Future Work

The analysis and results of this paper have broad implications for the theory behind PAC learning, particularly in evaluating the complexity and effectiveness of learning algorithms. The optimality of Majority-of-Three in expectation and its near-optimality in high probability scenarios reveal that highly simple aggregation methods can approach the theoretical limits of learnability in the realizable PAC setting.

Given the sub-optimality of Majority-of-Three in the high-probability bound by a $\log\log$ factor, a natural question that arises is whether a tighter analysis could eliminate this gap, thereby establishing Majority-of-Three as an optimal learner in both expectation and high-probability regimes. Analyzing the optimality of Majority-of-Three, or similarly simple learning algorithms, under different models or assumptions (e.g., agnostic learning setting, non-uniform learnability) are also interesting directions for future research.

Summary

In summary, this paper introduces and thoroughly analyzes a fundamental yet powerful learner within the PAC framework, highlighting the potential of simple majority schemes to achieve near-optimal generalization performance. The pursuit of simplicity, coupled with theoretical rigor, may pave the way toward understanding the essential properties that govern the efficiency and effectiveness of learning algorithms.

PDF Markdown

Related Papers

Tweets

https://twitter.com/kasperglarsen/status/1768608277628145983

https://twitter.com/yenhuan_li/status/1771773872305516613

https://twitter.com/StatMLPapers/status/1768488383456723441

https://twitter.com/fly51fly/status/1768754039490036005