PAC Learning: Minimax Risk Analysis

Updated 11 September 2025

PAC learning is a mathematical framework that quantifies generalization error using sample and model complexity, especially in agnostic settings.
It establishes minimax risk lower bounds and employs symmetric voting algorithms to achieve near-optimal performance under uncertainty.
The analysis links VC dimension and sample size ratios with risk behavior, offering practical insights for algorithm design and theoretical assessment.

Probably Approximately Correct (PAC) Learning Criterion

The Probably Approximately Correct (PAC) learning criterion is a foundational mathematical framework for quantifying the statistical limits of learning in the presence of uncertainty, limited data, and model mismatch. PAC learning characterizes the relationship between sample complexity, hypothesis class complexity, and achievable risk guarantees, both in realizable and agnostic (model-mismatched) settings. The PAC criterion provides theoretical lower bounds on generalization error as a function of data and model complexity, establishes minimax optimality conditions, and informs the construction of risk‐optimal algorithms. Exact lower bounds in the agnostic PAC model, as well as characterizations of minimax optimal learners, are derived via a combination of information theoretic, Bayesian, and distributional symmetry arguments.

1. Minimax Expected Excess Risk in the Agnostic Model

In agnostic PAC learning, the goal is to select, based on a finite sample of size $m$ , a hypothesis $h$ from a class $\mathcal{H}$ that approximately minimizes the expected classification error with respect to an unknown distribution $D$ , without assuming the true labeling function lies in $\mathcal{H}$ . The central quantity of interest is the minimax expected excess risk (EER), defined as

$\mathrm{EER} = \inf_L \sup_D \mathbb{E}_{Z_m \sim D^m}[R(L(Z_m), D) - R^*_{\mathcal{H}}(D)]$

where $L$ is any (possibly randomized) learning rule, $Z_m$ is the training sample, $R(h, D)$ is the expected risk, and $R^*_{\mathcal{H}}(D)$ is the best achievable risk in $\mathcal{H}$ .

Exact non-asymptotic lower bounds are derived for $\mathrm{EER}$ , showing sharp behavior even for moderate sample sizes. In the regime of large sample-to-complexity ratio,

$\nu = m / d$

with $m$ the sample size and $d$ the Vapnik–Chervonenkis (VC) dimension of $\mathcal{H}$ , the minimax excess risk satisfies the asymptotic lower bound

$\mathrm{EER} \gtrsim \frac{c_\infty}{\sqrt{\nu}}$

where $c_\infty \approx 0.16997$ is a universal constant. This demonstrates that, in general, even the best learning algorithms cannot achieve excess risk below $O(1/\sqrt{m/d})$ in the agnostic setting (Kontorovich et al., 2016).

2. Voting Procedures and Minimax Learning Algorithms

The minimax risk is attained by a class of learning algorithms characterized as “maximally symmetric” and “minimally randomized” voting procedures. For any input $x$ , the algorithm aggregates the training labels at $x$ :

If the label votes are unbalanced ( $v_x \neq 0$ ), assign the majority label $\mathrm{sgn}(v_x)$ ,
If balanced ( $v_x = 0$ ), resolve ties either by using the label of the first occurrence or, in the absence of any sample at $x$ , by minimal randomization (e.g., tossing a fair coin).

The specific minimax algorithm is: $L^*(z_m, u)(x) = \begin{cases} \mathrm{sgn}(v_x) & \text{if}\ v_x \neq 0 \ \text{label of first voter at } x & \text{if } v_x = 0,\, x \text{ in } z_m \ \mathrm{sgn}(u) & \text{if } x \notin z_m \end{cases}$ where $u$ is a $\mathrm{Uniform}(-1, 1)$ random variable. This “voting” learner is risk-equalizing over all distributions and achieves the minimax EER. Phase transitions and differences from empirical risk minimizers become negligible asymptotically, except for rare tie-breaking events.

3. Sample Size, Hypothesis Complexity, and the Fundamental Ratio

The ratio $\nu = m/d$ encapsulates the data-to-complexity tradeoff. Increasing $m$ (data) improves generalization, while increasing $d$ (hypothesis class complexity) elevates the minimax excess risk. The lower bound $c_\infty/\sqrt{\nu}$ thus tightly links generalization performance to both sample size and VC dimension, quantifying how agnostic learning fundamentally differs from the realizable setting, where faster rates (linear in $\nu$ ) are possible.

4. Improved Lower Bounds on Excess Risk Tail Probability

The paper substantially refines previous lower bounds on the probability that the excess risk exceeds a threshold $u$ . Earlier analyses yielded pessimistic tail bounds with constants as poor as $0.0156$ in important regimes. By explicit evaluation of the excess risk distribution under the minimax voting procedure and by developing new binomial inequalities based on the function $\mathrm{bayes}(k, b)$ , the lower bounds are improved to constants as high as $0.238$ and with exponents as small as $41.3$. This demonstrates that, in the worst case, non-negligible probability remains of observing substantial excess risk, which is crucial for understanding the limits of statistical learning under severe model mismatch.

5. Bayes Estimation and Binomial Identities

A central analytical innovation is the characterization of the minimax excess risk in terms of Bayes estimation for a sequence of binomially distributed “vote” counts: $\mathrm{bayes}(k, b) = \frac{1}{2}(1 - s_k(b))$ with $k$ the number of votes (binomially distributed), $b$ the label bias parameter, and $s_k(b)$ defined by the difference in probabilities that the vote sum is positive versus negative. For any point $x$ , the number of samples is $N_x \sim \mathrm{Binomial}(m, 1/d)$ , and the voting risk is $\mathrm{bayes}(N_x, |\gamma_x|)$ , with $\gamma_x$ the local conditional excess risk.

An explicit convex hull analysis of $\mathrm{bayes}(k, b)$ (especially its “almost convexity” and linear interpolation at odd $k$ ) enables application of Jensen’s inequality and precise estimation of both asymptotic and non-asymptotic lower bounds. These techniques provide tight control over excess risk, greatly improving upon previous approaches relying on looser union bounds or symmetrization arguments.

6. Implications for PAC Learning Theory

The exact lower bounds and minimax constructions presented clarify the agnostic PAC learning landscape in both theoretical and algorithmic terms:

Non-asymptotic optimality: Even the most effective empirical risk minimizers cannot outperform the explicit voting procedure’s minimax risk by more than a negligible amount, asymptotically.
Practical guidance for algorithm design: Minimax voting rules suggest robust choices for label prediction rules under maximal distributional uncertainty.
Sharpness of VC-based rates: The result $c_\infty/\sqrt{m/d}$ confirms that the classical $O(\sqrt{d/m})$ excess risk upper bounds are unimprovable without further structural assumptions (e.g., realizability, margin conditions).
Tail risk quantification: The improved tail lower bounds ensure that practitioners cannot expect uniform risk guarantees much stronger than the minimax rates, even for moderate $m$ and $d$ .

7. Summary Table: Minimax EER Behavior

Regime	Lower Bound	Asymptotic Constant	Algorithmic Attainment
Agnostic, non-asymptotic	$c_{m,d}^{LB} \approx c_\infty/\sqrt{m/d}$	$c_\infty\approx 0.16997$	Minimax voting procedure
Agnostic, large $\nu$	$c_\infty/\sqrt{\nu}$	$c_\infty$	Empirical risk minimizer

This rigorous analysis situates the PAC learning criterion for classification at the intersection of information theory, statistical minimax theory, and Bayes optimality, providing exact, nearly unimprovable risk guarantees for agnostic model selection and illuminating the critical influence of data and model complexity ratios (Kontorovich et al., 2016).

PDF Markdown Chat (Pro)

References (1)

Exact Lower Bounds for the Agnostic Probably-Approximately-Correct (PAC) Machine Learning Model (2016)

Follow Topic

Get notified by email when new papers are published related to Probably Approximately Correct (PAC) Learning Criterion.

PAC Learning: Minimax Risk Analysis

1. Minimax Expected Excess Risk in the Agnostic Model

2. Voting Procedures and Minimax Learning Algorithms

3. Sample Size, Hypothesis Complexity, and the Fundamental Ratio

4. Improved Lower Bounds on Excess Risk Tail Probability

5. Bayes Estimation and Binomial Identities

6. Implications for PAC Learning Theory

7. Summary Table: Minimax EER Behavior

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

PAC Learning: Minimax Risk Analysis

1. Minimax Expected Excess Risk in the Agnostic Model

2. Voting Procedures and Minimax Learning Algorithms

3. Sample Size, Hypothesis Complexity, and the Fundamental Ratio

4. Improved Lower Bounds on Excess Risk Tail Probability

5. Bayes Estimation and Binomial Identities

6. Implications for PAC Learning Theory

7. Summary Table: Minimax EER Behavior

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research