Sample-Complexity Bounds

Updated 24 October 2025

Sample-complexity bounds are rigorous measures that specify the minimum number of samples needed to achieve a desired accuracy and confidence in statistical learning and decision-making.
They compare various learning settings—from supervised PAC learning to reinforcement and nonparametric methods—highlighting the role of model capacity, intrinsic dimensions, and structural parameters.
These bounds guide the design of algorithms by offering tight upper and lower limits that establish optimality benchmarks and inform instance-specific exploration strategies.

Sample-complexity bounds quantify, for a given learning or inference problem, the minimum number of samples (data points, observations, or environment interactions) required to achieve a prescribed level of accuracy and confidence. In statistical learning, reinforcement learning, identification, and various estimation problems, these bounds provide a rigorous measure of the hardness of the problem and inform the data requirements of practical algorithms. Sample complexity is central to the theoretical analysis of learning algorithms, enabling comparison across problem classes, models, and algorithmic strategies.

1. Model-based and Non-Parametric Sample Complexity Frameworks

Sample-complexity analyses depend on the statistical model and problem structure, from finite hypothesis classes to infinite-dimensional function spaces and from i.i.d. observations to correlated trajectory data.

In supervised PAC (Probably Approximately Correct) learning, sample complexity refers to the number $m(\epsilon,\delta)$ of i.i.d. training examples needed for an algorithm to guarantee error at most $\epsilon$ with confidence $1-\delta$ . Canonical bounds in the realizable PAC setting are

$m(\epsilon, \delta) = \Theta \left( \frac{d + \log(1/\delta)}{\epsilon} \right)$

where $d$ is the VC dimension, confirming that sample efficiency is governed by the hypothesis class capacity and the target error level (Hanneke, 2015).

In reinforcement learning, with a finite hypothesis/model class $\mathcal{M}$ of $N$ environments, the MERL algorithm achieves near-optimal sample complexity, measured by the number of non- $\epsilon$ -optimal time-steps, with the sharp bound

$O\left( \frac{N}{\epsilon^2 (1-\gamma)^3} \log^2\left( \frac{N}{\delta\,\epsilon(1-\gamma)} \right) \right)$

(Lattimore et al., 2013). Here, $N$ captures model uncertainty, $\epsilon$ the suboptimality tolerance, $(1-\gamma)^{-1}$ the planning horizon in discounted MDPs, and $\delta$ the confidence parameter. For infinite classes, compactness (existence of finite $\epsilon$ -covers) is essential to obtain uniform sample-complexity bounds.

In nonparametric settings, as in estimation of the 1-Wasserstein distance or Sobolev IPMs, the dependence on intrinsic dimension dominates: for arbitrary measures on a $d$ -dimensional compact manifold $\mathcal{M}$ ,

$\mathbb{E}[ W_1(\mu, \hat{\mu}_n) ] \lesssim n^{-1/d}$

but, under group invariance, the effective sample size is boosted and the convergence exponent can be improved; for smooth densities,

$\mathbb{E}[ W_1(\mu, \hat{\mu}_n) ] \lesssim \left( \frac{\operatorname{vol}(\mathcal{M}/G)}{n} \right)^{\frac{s+1}{2s+d}}$

where $d=\dim(\mathcal{M}/G)$ is the dimension of the quotient space under the group action and $s$ the Sobolev smoothness of the density (Tahmasebi et al., 2023).

2. Fundamental Lower and Upper Bounds

Tight sample-complexity bounds, often matching up to logarithmic factors, are critical for establishing the efficiency or optimality of algorithms.

In PAC learning, the optimal sample bound matches the lower bound:

$m(\epsilon, \delta) = \Theta\left(\frac{d + \log(1/\delta)}{\epsilon}\right)$

as both upper and lower bounds are controlled by VC dimension (Hanneke, 2015). The breakthrough approach leverages recursive, overlapping partitioning with majority-vote aggregation to remove previous superfluous logarithmic dependencies.

In general reinforcement learning over a finite candidate class, the MERL bound is tight up to logarithmic terms, as shown by explicit counterexamples that realize the lower bound:

$\Omega\left(\frac{N}{\epsilon^2 (1-\gamma)^3} \log\frac{1}{\delta}\right)$

and no algorithm can achieve an order-of-magnitude improvement (Lattimore et al., 2013).

For identification of finite dynamical systems via maximum likelihood estimation, both upper and lower bounds are established. The lower bound, via information-theoretic arguments, ensures:

$T \cdot \text{SNR-like terms} \geq 2 \log\left(\frac{1}{2.4\delta}\right)$

with the SNR-like terms capturing the distinguishability between the true system and alternatives (Chatzikiriakos et al., 17 Sep 2024).

In nonconvex bilevel RL, the sample complexity for reaching $\epsilon$ -stationarity is improved from $\widetilde{O}(\epsilon^{-6})$ (previous, via two-timescale or Hessian-based approaches) to $\widetilde{O}(\epsilon^{-3})$ through a penalty-based, PL-condition-exploiting first-order method (Gaur et al., 22 Mar 2025).

3. The Role of Structural Problem Parameters

Sample complexity is dictated not only by the ambient or covering dimension, but also by finer parameters that reflect the problem's intrinsic difficulty:

Model class size ( $N$ ): Finite-model selection and bandit problems exhibit sample complexity linear in $N$ , unless further structure (e.g., compactness, metric entropy, or strong smoothness conditions) can be exploited.
Margin or gap parameters: For instance-dependent analysis in best- $k$ -arm bandits or zero-sum matrix games, the minimal gap $\Delta$ between optimal and suboptimal choices dramatically affects the bound:

$\Omega\bigg(\min \left\{ \frac{1}{\epsilon^2}, \frac{1}{\Delta_{\min}^2}, \frac{1}{\epsilon |D|} \right\}\log(1/\delta) \bigg)$

where $\Delta_{\min}$ and $D$ are problem-dependent parameters capturing the ease of distinguishing actions or equilibria (Maiti et al., 2023, Chen et al., 2017).

Slack or Slater constants ( $\zeta$ ): In constrained MDPs and their average-reward or discounted variants, the smallest gap between the constraint threshold and the best feasible value for any policy fundamentally determines strict feasibility cost:

$\tilde{O}\left( \frac{S A}{(1-\gamma)^5 \epsilon^2 \zeta^2} \right)$

(Vaswani et al., 2022, Wei et al., 20 Sep 2025). Similar scalings hold in CAMDPs with average constraints.

Littlestone/VC/Sobolev dimension: In learning-theoretic and nonparametric settings, sample complexity is governed by the cardinality of $\epsilon$ -covers, the Littlestone dimension (for DP learning), or the smoothness/dimension of the function space, contingent on the metrics of interest (Feldman et al., 2014, Tahmasebi et al., 2023).

4. Algorithmic Design and Instance-Optimality

State-of-the-art algorithms attain these sample bounds by leveraging:

Value-based model elimination: In model-based RL with arbitrary reward and transition structures, algorithms like MERL eliminate inconsistent models via value discrepancy statistics, coupled with tail concentration inequalities to ensure reliable elimination (Lattimore et al., 2013).
Instance-dependent exploration: Bandit algorithms with precise gap analysis adapt sample allocation dynamically to the hardness of distinguishing between competing arms, achieving nearly instance-optimality (Chen et al., 2017).
Primal–dual and penalty-based updates: In constrained RL and bilevel optimization, primal–dual methods interleave unconstrained solves with Lagrange multiplier updates, while penalty-based surrogates allow for hypergradient estimation without explicit second-order information (Vaswani et al., 2022, Gaur et al., 22 Mar 2025).
Quantum settings: Quantum ERM with shadow tomography enables learning from quantum measurements with sample complexity scaling as $O(V_{\mathcal{C}^*} \log |\mathcal{C}^*|)$ , reflecting both the compressed representation over extreme points and the variance introduced by measurement incompatibility (Heidari et al., 22 Aug 2024).

5. Sample Complexity in High-Dimensional and Structured Problems

Sequential Monte Carlo (SMC): For a family of interpolating distributions and geometrically ergodic kernels, finite-sample error $\epsilon$ can be achieved with

$N \gtrsim \max\left\{9 W^2 Z^2,\; 1/\epsilon^2 \right\}$

particles per stage, and per-stage kernel steps $t \gtrsim (\ln N S) / \rho$ for spectral gap $\rho$ , with overall complexity within logarithmic factors of MCMC (Marion et al., 2018).

Sample average approximation (SAA) in convex programming: Recent "metric entropy-free" analyses show SAA's sample efficiency can match that of stochastic mirror descent up to constants, eliminating an $O(d)$ dependence previously attributed to covering-number bounds (Liu et al., 1 Jan 2024).
Privacy and learning: In differentially private learning, the separation between pure and approximate privacy regimes leads to dramatically different sample complexities, with pure DP scaling as $\Omega(\mathrm{LDim}(C))$ and approximate DP reducing to $O(\log(1/\beta)/\alpha)$ in many cases (Feldman et al., 2014). For quantum differential privacy, the lower and upper bounds for estimation error variance scale as $\Theta(\epsilon^{-2})$ in the small $\epsilon$ regime, identical to the classical setting for scalar parameters (Farokhi, 24 Jan 2025).

6. Implications, Applications, and Open Problems

Sample-complexity bounds serve as a benchmark for algorithmic efficiency and a guide for the design or deployment of learning systems under resource constraints.

In RL, these bounds set explicit limits on the amount of experience (simulator queries, environment rollouts) required to achieve near-optimal performance, with crucial sensitivity to model/constraint structure and planning horizon. For strictly constrained problems, the Slater constant and bias/horizon parameters appear as irreducible statistical bottlenecks (Wei et al., 20 Sep 2025).
In large-scale quantum learning and quantum-private systems, improvements in sample-complexity scaling make sample-efficient protocols feasible, matching or approaching classical learning regimes in some settings (Heidari et al., 22 Aug 2024, Farokhi, 24 Jan 2025).
For high-dimensional generative modeling (e.g., diffusion models), robust score-matching analyses show exponential improvements in sample complexity as a function of the inverse Wasserstein or TV error, deprecating the dominant role of data radius or norm when measuring the error in robust metrics (Gupta et al., 2023).

Open problems include closing gaps between upper and lower bounds in regimes with nontrivial structure (e.g., non-compact infinite model classes in RL, high-dimensional manifold estimation with unknown invariances), extending instance-optimality notions to more complex or correlated interaction models, and developing algorithmic strategies that directly exploit dimensionality reduction (group actions, low intrinsic dimension) for further sample efficiency.

7. Representative Comparison Table

Setting	Sample Complexity Bound	Dominant Parameters
PAC learning (realizable)	$\Theta\left( \frac{d+\log(1/\delta)}{\epsilon}\right)$	VC dim $d$ , accuracy $\epsilon$ , conf. $\delta$
RL (finite class MERL)	$O\left( \frac{N}{\epsilon^2 (1-\gamma)^3} \log^2\frac{N}{\delta \epsilon(1-\gamma)} \right)$	$N$ , $\epsilon$ , $\gamma$ , $\delta$
CMDP (relaxed/strict feasible)	$\tilde{O}\left(\frac{SA}{(1-\gamma)^3\epsilon^2}\right)$ / $\tilde{O}\left(\frac{SA}{(1-\gamma)^5\epsilon^2 \zeta^2}\right)$	$S$ , $A$ , $\gamma$ , $\epsilon$ , $\zeta$
Bandit Top- $k$ selection	$\Omega(H \ln(1/\delta) + \text{gap terms})$	Sum of inverse squared gaps, $k$ , $\delta$
Quantum DP (scalar param est.)	$\Theta(\alpha^{-1} \epsilon^{-2})$	target variance $\alpha$ , privacy $\epsilon$

These sample-complexity bounds provide tight (in many cases, optimal up to logarithmic terms) characterizations across a range of modern learning and inference settings, with ongoing research focused on further reducing dependencies through structural, distributional, or algorithmic advances.