Papers
Topics
Authors
Recent
2000 character limit reached

Agnostic Multiclass PAC Sample Complexity

Updated 23 November 2025
  • The paper establishes tight sample complexity bounds using a three-stage procedure that integrates improper covers, multiplicative weights, and sample-compression.
  • It demonstrates that agnostic multiclass learning hinges on two key dimensions—Natarajan and DS—with Natarajan dominating high-accuracy regimes and DS determining overall learnability.
  • The results resolve longstanding questions by showing that both dimensions are crucial: the Natarajan term drives the 1/ε² bound while the DS term governs the 1/ε regime, even under bandit feedback.

The agnostic multiclass PAC (Probably Approximately Correct) sample complexity problem seeks to characterize the number of labeled examples required to learn a multiclass hypothesis class, with no assumption that the target hypothesis is included in the class. Unlike the binary case, which is governed by a single combinatorial parameter—the VC dimension—multiclass learning in the agnostic setting is controlled by two distinct combinatorial dimensions. Recent breakthroughs have precisely delineated the roles of the Natarajan and the Daniely–Shalev-Shwartz (DS) dimensions in governing agnostic sample complexity, resolving foundational questions regarding which structural parameters dictate learnability and asymptotic rates.

1. Fundamental Combinatorial Dimensions

Let X\mathcal{X} denote the instance space and Y\mathcal{Y} the (possibly infinite) label space. For any multiclass hypothesis class HYX\mathcal{H}\subseteq\mathcal{Y}^\mathcal{X}, three key dimensions are fundamental:

  • Natarajan Dimension (dNd_N): The maximum dd such that there exist points x1,,xdXx_1,\ldots,x_d\in\mathcal{X} and two labelings f,g:{1,,d}Yf,g:\{1,\ldots,d\}\to\mathcal{Y} with f(i)g(i)f(i)\neq g(i) for all ii, where for every subset S{1,,d}S\subseteq\{1,\ldots,d\} a hypothesis hHh\in\mathcal{H} exists predicting h(xi)=f(i)h(x_i)=f(i) for iSi\in S, h(xj)=g(j)h(x_j)=g(j) for jSj\notin S.
  • DS Dimension (dDSd_{DS}): The largest dd for which the class can realize all labelings corresponding to a dd-dimensional “pseudo-cube” in Yd\mathcal{Y}^d. For binary labels, this coincides with the VC dimension.
  • Realizable Dimension (dREd_{RE}): The least sample size nn ensuring that, for every realizable distribution, a deterministic learner achieves error at most r<1/2r<1/2 in expectation, taking dRE(H)=infAdRE(A,H,1/(9e))d_{RE}(\mathcal{H})=\inf_A d_{RE}(A,\mathcal{H},1/(9e)). It is established that Ω(dDS)dRE(H)O~(dDS1.5)\Omega(d_{DS})\le d_{RE}(\mathcal{H})\le \tilde{O}(d_{DS}^{1.5}).

These dimensions capture different aspects of combinatorial complexity for multiclass classes and are provably not equivalent; dDSd_{DS} can be arbitrarily larger than dNd_N (Cohen et al., 16 Nov 2025).

2. Tight Bounds on Agnostic Sample Complexity

The agnostic sample complexity mAG(ϵ,δ)m_{AG}(\epsilon,\delta) is the minimum sample size for which a learner output h^\hat{h}, with probability at least 1δ1-\delta, achieves risk at most infhHerP(h)+ϵ\inf_{h\in\mathcal{H}}er_P(h)+\epsilon under all PP over X×Y\mathcal{X}\times\mathcal{Y}. The main result is:

mAG(ϵ,δ)dNϵ2+dREϵ+ln(1/δ)ϵ2m_{AG}(\epsilon,\delta) \asymp \frac{d_N}{\epsilon^2} + \frac{d_{RE}}{\epsilon} + \frac{\ln(1/\delta)}{\epsilon^2}

Substituting dRE=O~(dDS1.5)d_{RE}=\tilde{O}(d_{DS}^{1.5}) gives (up to logarithmic factors):

mAG(ϵ,δ)=O(dNϵ2+dDS1.5ϵ+ln(1/δ)ϵ2)m_{AG}(\epsilon,\delta) = O\left(\frac{d_N}{\epsilon^2} + \frac{d_{DS}^{1.5}}{\epsilon} + \frac{\ln(1/\delta)}{\epsilon^2}\right)

Both dN/ϵ2d_N/\epsilon^2 and dRE/ϵd_{RE}/\epsilon terms are necessary: the first dominates in high-accuracy (small-ϵ\epsilon) regimes, while the second dominates in high-noise scenarios (Cohen et al., 16 Nov 2025). This dual dependence is unique to the agnostic multiclass setting, contrasting with the binary case where the VC dimension suffices.

3. Algorithmic Methodology and Proof Outline

The proof establishes the upper bound via a three-stage procedure integrating improper learning, online multiplicative weights, and sample-compression:

  • Stage 1: Construct a finite improper cover FF of the hypothesis class using m1=O~(dRE/ϵ)m_1 = \tilde{O}(d_{RE}/\epsilon) samples and a realizable learner. Every hHh\in\mathcal{H} is closely approximated by some fFf\in F—guaranteeing agreement except on an O(ϵ)O(\epsilon) fraction of points.
  • Stage 2: Reduce the effective label space by running an online, self-adaptive multiplicative-weights process over FF for T=Θ((logF)/ϵ)T=\Theta((\log|F|)/\epsilon) rounds. The MW algorithm constructs a “menu” μ(x)\mu(x) of candidate predictions which, with appropriate regret guarantees, covers the true label in nearly every instance for all fFf\in F.
  • Stage 3: Restrict attention to prediction from μ(x)\mu(x) and apply sample-compression under the partial-concept loss, leveraging m3=O~(dNlogT/ϵ2)m_3 = \tilde{O}(d_N \log T / \epsilon^2) fresh samples and an ERM or one-inclusion approach.

Notably, traditional uniform convergence techniques are insufficient in the presence of unbounded label sets, and reductions to the realizable case also break down for improper multiclass learners. The new MW-based reduction avoids these obstacles and is inherently improper and adaptive.

4. Prior Work and Resolution of the Natarajan Dimension Question

Historical approaches (e.g., Daniely–Shalev-Shwartz 2014; Brukhim et al. 2022) established that finite DS dimension characterizes multiclass learnability but left open whether the Natarajan dimension materially impacts agnostic rates. Earlier quantitative bounds were of the form dDS1.5/ϵ2d_{DS}^{1.5}/\epsilon^2, implying no direct role for dNd_N in algorithms' rates. The new results definitively show that, in the low-noise and high-accuracy regime (ϵ0\epsilon\to 0), the Natarajan term dN/ϵ2d_N/\epsilon^2 is not only present but leading. This recovers classical lower bounds and clarifies that agnostic multiclass PAC learning fundamentally depends on both the DS and Natarajan dimensions (Cohen et al., 16 Nov 2025).

5. Comparisons: Bandit Feedback and Full-Information Models

Extensions to settings with limited feedback provide further context. In the agnostic bandit feedback setting with a finite class HH and label set of size KK, the sample complexity achieves m=O((poly(K)+1/ϵ2)ln(H/δ))m=O((\text{poly}(K) + 1/\epsilon^2)\, \ln(|H|/\delta)), a rate that up to logarithmic factors matches the optimal O((1/ϵ2)lnH)O((1/\epsilon^2)\,\ln|H|) bound for full-information PAC learning. The bandit model introduces only an O(1)O(1) multiplicative overhead as ϵ0\epsilon\to 0, a contrast with the realizable case where the gap is Θ(K)\Theta(K) (Erez et al., 18 Jun 2024). When generalizing to infinite classes with finite Natarajan dimension, the sample complexity becomes O((poly(K)+1/ϵ2)dNln(1/δ))O((\text{poly}(K)+1/\epsilon^2)\,d_N\ln(1/\delta)), further cementing the primacy of dN/ϵ2d_N/\epsilon^2 in the agnostic small-ϵ\epsilon regime.

6. Implications and Structural Consequences

Multiclass agnostic PAC learning uniquely involves two structural parameters—DS dimension and Natarajan dimension—which exert control over different accuracy regimes:

  • DS Dimension (dDSd_{DS}): Governs learnability (whether a class is agnostic-PAC-learnable at all) and the 1/ϵ1/\epsilon sample complexity regime.
  • Natarajan Dimension (dNd_N): Governs the 1/ϵ21/\epsilon^2 term, crucial in high-accuracy/low-noise regimes, and controls the uniform-convergence cost once the label space is effectively bounded.

As dDSd_{DS} can be much larger than dNd_N, there exist hypothesis classes for which the dN/ϵ2d_N/\epsilon^2 term is eventually dominant. The key methodological innovation—online multipicative-weights label-space reduction plus sample-compression—circumvents obstacles that thwart uniform convergence and proper-reduction arguments typical in binary and online integrals.

A plausible implication is that related list-bounded or partial-concept loss approaches may apply to other non-ERM multiclass settings with similar combinatorial pathologies.

7. Summary Table: Sample Complexity Dependence

Regime Leading Term Dimension Involved
Low noise / high accuracy (ϵ0\epsilon\to 0) dN/ϵ2d_N/\epsilon^2 Natarajan (dNd_N)
High noise / moderate accuracy dRE/ϵd_{RE}/\epsilon or dDS1.5/ϵd_{DS}^{1.5}/\epsilon DS / Realizable (dDSd_{DS}, dREd_{RE})
Bandit feedback (finite class) O((poly(K)+1/ϵ2)ln(H/δ))O((\mathrm{poly}(K)+1/\epsilon^2)\ln(|H|/\delta)) Primarily dNd_N, plus KK factors

This framework resolves the longstanding question of whether the Natarajan dimension matters for agnostic multiclass PAC learning: it does—dictating the dominant 1/ϵ21/\epsilon^2 term—while the DS dimension dictates the 1/ϵ1/\epsilon regime and overall learnability (Cohen et al., 16 Nov 2025, Erez et al., 18 Jun 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Agnostic Multiclass PAC Sample Complexity.