Optimal High-Probability PAC Bound for Majority-of-Three

Establish that the Majority-of-Three algorithm—defined as partitioning the training sample S into three equal-sized disjoint subsets S1, S2, S3, running the same empirical risk minimization (ERM) algorithm on each labeled subset, and returning the majority vote Maj(f̂_{S1}, f̂_{S2}, f̂_{S3})—achieves the optimal high-probability error bound O(d/n + (1/n) log(1/δ)) for every binary concept class F ⊆ {0,1}^X of Vapnik–Chervonenkis (VC) dimension d, every distribution P over X, every target function f* ∈ F, any ERM implementation, any sample size n, and all confidence parameters δ ∈ (0,1).

Background

The paper studies an extremely simple improper learner for realizable PAC learning: Majority-of-Three, which partitions the sample into three disjoint subsets, runs the same ERM on each, and outputs their majority vote. The authors prove that Majority-of-Three is optimal in expectation (matching the one-inclusion graph’s in-expectation bound) and gives a near-optimal high-probability bound that includes an extra log log factor.

They conjecture that a refined analysis will eliminate this log log term, yielding the fully optimal high-probability rate Θ(d/n + (1/n) log(1/δ)) for all δ. Establishing this would show that a remarkably simple algorithm achieves the best-known PAC learning bounds in both expectation and high probability, resolving the question of whether Majority-of-Three is optimal across all regimes.

References

Because of this, we conjecture that Majority-of-Three is in fact optimal for all δ and leave this as an open question for future research.

Majority-of-Three: The Simplest Optimal Learner? (2403.08831 - Aden-Ali et al., 12 Mar 2024) in Introduction, Subsection “The simplest possible optimal algorithm?”, immediately following Theorem \ref{highprobboundsection:theorem}