Agnostic Multiclass PAC Sample Complexity
- The paper establishes tight sample complexity bounds using a three-stage procedure that integrates improper covers, multiplicative weights, and sample-compression.
- It demonstrates that agnostic multiclass learning hinges on two key dimensions—Natarajan and DS—with Natarajan dominating high-accuracy regimes and DS determining overall learnability.
- The results resolve longstanding questions by showing that both dimensions are crucial: the Natarajan term drives the 1/ε² bound while the DS term governs the 1/ε regime, even under bandit feedback.
The agnostic multiclass PAC (Probably Approximately Correct) sample complexity problem seeks to characterize the number of labeled examples required to learn a multiclass hypothesis class, with no assumption that the target hypothesis is included in the class. Unlike the binary case, which is governed by a single combinatorial parameter—the VC dimension—multiclass learning in the agnostic setting is controlled by two distinct combinatorial dimensions. Recent breakthroughs have precisely delineated the roles of the Natarajan and the Daniely–Shalev-Shwartz (DS) dimensions in governing agnostic sample complexity, resolving foundational questions regarding which structural parameters dictate learnability and asymptotic rates.
1. Fundamental Combinatorial Dimensions
Let denote the instance space and the (possibly infinite) label space. For any multiclass hypothesis class , three key dimensions are fundamental:
- Natarajan Dimension (): The maximum such that there exist points and two labelings with for all , where for every subset a hypothesis exists predicting for , for .
- DS Dimension (): The largest for which the class can realize all labelings corresponding to a -dimensional “pseudo-cube” in . For binary labels, this coincides with the VC dimension.
- Realizable Dimension (): The least sample size ensuring that, for every realizable distribution, a deterministic learner achieves error at most in expectation, taking . It is established that .
These dimensions capture different aspects of combinatorial complexity for multiclass classes and are provably not equivalent; can be arbitrarily larger than (Cohen et al., 16 Nov 2025).
2. Tight Bounds on Agnostic Sample Complexity
The agnostic sample complexity is the minimum sample size for which a learner output , with probability at least , achieves risk at most under all over . The main result is:
Substituting gives (up to logarithmic factors):
Both and terms are necessary: the first dominates in high-accuracy (small-) regimes, while the second dominates in high-noise scenarios (Cohen et al., 16 Nov 2025). This dual dependence is unique to the agnostic multiclass setting, contrasting with the binary case where the VC dimension suffices.
3. Algorithmic Methodology and Proof Outline
The proof establishes the upper bound via a three-stage procedure integrating improper learning, online multiplicative weights, and sample-compression:
- Stage 1: Construct a finite improper cover of the hypothesis class using samples and a realizable learner. Every is closely approximated by some —guaranteeing agreement except on an fraction of points.
- Stage 2: Reduce the effective label space by running an online, self-adaptive multiplicative-weights process over for rounds. The MW algorithm constructs a “menu” of candidate predictions which, with appropriate regret guarantees, covers the true label in nearly every instance for all .
- Stage 3: Restrict attention to prediction from and apply sample-compression under the partial-concept loss, leveraging fresh samples and an ERM or one-inclusion approach.
Notably, traditional uniform convergence techniques are insufficient in the presence of unbounded label sets, and reductions to the realizable case also break down for improper multiclass learners. The new MW-based reduction avoids these obstacles and is inherently improper and adaptive.
4. Prior Work and Resolution of the Natarajan Dimension Question
Historical approaches (e.g., Daniely–Shalev-Shwartz 2014; Brukhim et al. 2022) established that finite DS dimension characterizes multiclass learnability but left open whether the Natarajan dimension materially impacts agnostic rates. Earlier quantitative bounds were of the form , implying no direct role for in algorithms' rates. The new results definitively show that, in the low-noise and high-accuracy regime (), the Natarajan term is not only present but leading. This recovers classical lower bounds and clarifies that agnostic multiclass PAC learning fundamentally depends on both the DS and Natarajan dimensions (Cohen et al., 16 Nov 2025).
5. Comparisons: Bandit Feedback and Full-Information Models
Extensions to settings with limited feedback provide further context. In the agnostic bandit feedback setting with a finite class and label set of size , the sample complexity achieves , a rate that up to logarithmic factors matches the optimal bound for full-information PAC learning. The bandit model introduces only an multiplicative overhead as , a contrast with the realizable case where the gap is (Erez et al., 18 Jun 2024). When generalizing to infinite classes with finite Natarajan dimension, the sample complexity becomes , further cementing the primacy of in the agnostic small- regime.
6. Implications and Structural Consequences
Multiclass agnostic PAC learning uniquely involves two structural parameters—DS dimension and Natarajan dimension—which exert control over different accuracy regimes:
- DS Dimension (): Governs learnability (whether a class is agnostic-PAC-learnable at all) and the sample complexity regime.
- Natarajan Dimension (): Governs the term, crucial in high-accuracy/low-noise regimes, and controls the uniform-convergence cost once the label space is effectively bounded.
As can be much larger than , there exist hypothesis classes for which the term is eventually dominant. The key methodological innovation—online multipicative-weights label-space reduction plus sample-compression—circumvents obstacles that thwart uniform convergence and proper-reduction arguments typical in binary and online integrals.
A plausible implication is that related list-bounded or partial-concept loss approaches may apply to other non-ERM multiclass settings with similar combinatorial pathologies.
7. Summary Table: Sample Complexity Dependence
| Regime | Leading Term | Dimension Involved |
|---|---|---|
| Low noise / high accuracy () | Natarajan () | |
| High noise / moderate accuracy | or | DS / Realizable (, ) |
| Bandit feedback (finite class) | Primarily , plus factors |
This framework resolves the longstanding question of whether the Natarajan dimension matters for agnostic multiclass PAC learning: it does—dictating the dominant term—while the DS dimension dictates the regime and overall learnability (Cohen et al., 16 Nov 2025, Erez et al., 18 Jun 2024).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free