Sample Complexity Bound Overview

Updated 19 October 2025

Sample Complexity Bound is a measure that defines the minimum number of random samples required for an algorithm to achieve a specified error tolerance with high probability.
It quantifies algorithm efficiency, informs design choices, and highlights theoretical limits across domains such as supervised classification, privacy-preserving learning, and reinforcement learning.
The bounds incorporate factors like VC-dimension, noise levels, and privacy constraints, guiding optimal data collection for robust and efficient performance.

A sample complexity bound refers to a rigorous assessment of the number of random samples required to guarantee that a learning or estimation algorithm achieves a desired accuracy and confidence level. In statistical learning, a sample complexity bound is often expressed as the minimal number of samples needed to ensure that an estimator’s output is within ε of the target (in, e.g., probability, norm, error, or risk) with probability at least 1−δ. Sample complexity analysis provides fundamental guidance for algorithm design, quantifies the efficiency of statistical procedures, and illuminates the theoretical limitations of learning and estimation frameworks across domains such as streaming algorithms, supervised classification, compressed sensing, privacy-preserving learning, reinforcement learning, quantum inference, and generative modeling.

1. Definition and General Principles

Classically, the sample complexity is defined as the minimum integer n so that, for all sample sizes at least n, an estimator or learning rule achieves, with probability at least 1−δ, an error no greater than ε (where both ε and δ are user-specified tolerances). In the Probably Approximately Correct (PAC) model, the sample complexity is the smallest n for which all PAC empirical risk minimizers with n data points yield error smaller than ε with at least 1−δ probability for all distributions in a specified class.

Mathematically, for an estimator $\hat{f}_n$ of a target function $f^*$ ,

$\Pr \left( \mathrm{error}(\hat{f}_n, f^*) \leq \varepsilon \right) \geq 1 - \delta$

provided $n \geq N^*(\varepsilon, \delta,\mathcal{P})$ , where $\mathcal{P}$ denotes the underlying problem or hypothesis class.

Sample complexity bounds are often dimension-dependent and depend on the complexity of function classes, regularity conditions, optimization landscapes, privacy/robustness requirements, noise regimes, and algorithmic assumptions.

2. Classical Results and Information-Theoretic Limits

Foundational results in learning theory link sample complexity to combinatorial parameters such as VC-dimension, fat-shattering dimension, or covering numbers.

Supervised Classification (PAC): The optimal sample complexity for realizable PAC learning is

$m(\varepsilon, \delta) = \Theta\left( \frac{1}{\varepsilon}(d + \ln(1/\delta)) \right)$

where $d$ is the VC-dimension of the hypothesis class. This bound is tight up to constants (Hanneke, 2015).

Empirical Risk Minimization and Generalization: For real-valued function classes with bounded fat-shattering dimension $d(\varepsilon)$ , the uniform deviation between empirical and true risk is typically controlled provided

$n \gtrsim \frac{d(\varepsilon)}{\varepsilon^2}$

with explicit dependence on confidence parameters and function class complexity (Musayeva, 2020).

Finite Labelled Data in Semi-Supervised Setting: In nonparametric semi-supervised multiclass learning, the number of labeled examples sufficient for permutation recovery is $\Omega(K \log K)$ , matching a coupon-collector bound for K regions/classes (Dan et al., 2018).
Statistical Estimation and Smoothing: For mean or covariance estimation in $d$ dimensions, classical lower bounds are $\Omega(d/\varepsilon^2)$ and $\Omega(d^2/\varepsilon^2)$ , respectively; improved bounds under robust contamination match these up to constants (Diakonikolas et al., 2020).
Quantum Inference: For quantum measurement learning, the sample complexity for a concept class $\mathcal{C}$ is

$O(V_{\mathcal{C}^*} \log |\mathcal{C}^*|/\varepsilon^2)$

where $V_{\mathcal{C}^*}$ is the shadow norm and $|\mathcal{C}^*|$ counts the extreme points of the convex closure (Heidari et al., 22 Aug 2024).

3. Specialized Sample Complexity Bounds Across Domains

Data Streams and Entropy Estimation

Compressed Counting (CC) for αth frequency moments $F_\alpha$ supplies an illustration of achieving favorable sample complexity via estimator design. Using a sample minimum estimator for maximally-skewed stable random projections, the sample complexity for a $(1\pm \epsilon)$ -approximation as $\alpha \to 1$ (Shannon entropy regime) satisfies

$k \geq \frac{\log(1/\delta)}{ \log(1/\Delta) - \log(\dots) }$

where $\Delta = 1 - \alpha$ (0910.1403). For very small $\epsilon$ and $\Delta$ , the required number of projections $k$ can be a small constant, a sharp improvement over previous $O(1/\epsilon)$ or $O(1/\epsilon^2)$ dependencies.

Privacy-Preserving Learning

For pure differentially private PAC learning, sample complexity is closely linked to the Littlestone dimension ( $LDim(C)$ ), with lower bounds

$SCDP(C) = \Omega(LDim(C))$

and explicit separations showing that $SCDP(C)$ can be arbitrarily larger than the VC-dimension. Further, approximate differential privacy—i.e., $(\alpha, \beta)$ -privacy—provides an exponential gain, reducing sample complexity from $\Omega(t/\alpha)$ (pure) to $O(\log(1/\beta)/\alpha)$ (approximate) (Feldman et al., 2014).

Reinforcement Learning and Markov Games

In robust Q-learning with distributional uncertainty, the worst-case expected sample complexity to estimate the robust $Q$ -function within error $\epsilon$ in the sup norm is

$\tilde{O}\left( |S||A|(1 - \gamma)^{-5}\epsilon^{-2}p_{\wedge}^{-6} \delta^{-4} \right)$

where $|S|$ , $|A|$ are the state/action spaces, $p_{\wedge}$ is the minimal support probability, and $\delta$ is the uncertainty radius (Wang et al., 2023).

In multi-agent Markov games with independent linear function classes, an improved bound under the local access model for $\varepsilon$ -CCE is

$\tilde{O}\left( m^2 d^3 H^6 \min\left\{ \frac{\log S}{d},\ A \right\} \epsilon^{-2} \right)$

where $m$ is the number of agents, $d$ the feature dimension, $H$ the time horizon, $S$ the state space cardinality, and $A$ the action space cardinality (Fan et al., 18 Mar 2024).

Sequential and Monte Carlo Methods

For sequential Monte Carlo (SMC) estimators,

$N \geq \tfrac{1}{2}\log(128S) \cdot \max\{ 9W^2 Z^2, 1/\epsilon^2 \}$

ensures error at most $\epsilon$ for bounded test functions, with $S$ the number of stages, and $W$ , $Z$ uniform upper bounds on importance weights (Marion et al., 2018). For MCMC, strong mixing and low density ratios further improve sample complexity.

Diffusion Models and High-Dimensional Generative Models

For continuous-state diffusion models:

$\tilde{O}(\epsilon^{-6})$

sample complexity is sufficient to guarantee TV error $\epsilon$ between the learned and data distributions, achieved without the need for ERM access and without exponential dependence on data dimension or network parameters; the bound leverages a decomposition of score estimation error into statistical, approximation, and optimization terms (Gaur et al., 23 May 2025).

For discrete-state diffusion models, the sample complexity per diffusion step is

$\tilde{O}(\epsilon^{-2})$

where error contributions are controlled by approximation, statistical, optimization, and clipping errors; this matches the classical minimax rate for mean estimation (Srikanth et al., 12 Oct 2025).

Random Function Spaces and Polynomial Recovery

For isotropic Gaussian random fields on the sphere, the sample complexity of $L_\infty$ -recovery is polynomial in $1/\epsilon$ , in contrast to exponential dependence in the worst case. The key is that the $L_\infty/L_2$ ratio for spherical harmonic components is bounded as $O(d \sqrt{\ln k})$ with high probability, mitigating the "spikiness" that otherwise causes exponential sample blow-up (Dong et al., 2023).

4. Sample Complexity in System Identification

In the identification of LTI systems from finitely many candidates using trajectory data, the sample complexity upper bound (for MLE estimates) involves both an explicit “burn-in” threshold and excitation conditions, e.g.,

$\left\lfloor T/k \right\rfloor \geq \frac{320}{3} \log\left( \frac{2 n_x N}{\delta} \right)$

and

$\sqrt{n_x} + n_x \leq \left[\frac{9k \lfloor T/k \rfloor}{3200 T}\right] \cdot (\text{excitation terms})$

with lower bounds established by information-theoretic arguments. Notably, these bounds do not require a stability assumption, and directly depend on the system’s excitation and distinguishability properties (Chatzikiriakos et al., 17 Sep 2024).

For general uncontrolled linear systems, the PAC lower bound is

$\lambda_{\min}\left( \sum_{s=1}^{\tau_A-1} \Gamma_{s-1}(A) \right) \geq \frac{1}{2\epsilon^2} \ln\left( \frac{1}{2.4\delta} \right)$

where $\Gamma_{s}(A)$ is the finite-time controllability gramian (Jedra et al., 2019).

For nonlinear systems via Koopman operator methods, the estimation error satisfies

$\| \hat{K} - K \|_F \leq \frac{\sqrt{\Delta}}{\sqrt{T}} \sqrt{ \mathbb{E}[ \mathrm{Tr}(\hat{\Sigma}_0) ] \mathbb{E}[ \| \hat{\Sigma}_0^{-1} \|_F^2 ] }$

demonstrating sample efficiency on the order $O(1/\sqrt{T})$ (Chen et al., 2018).

5. Factors Influencing Sample Complexity

Factor	Influence on Bound	Typical Manifestation
Function class complexity	Polynomial/exponential scaling	VC/fat-shattering dimension, covering #
Regularity/convexity	Improved rates if present	Strong convexity, PL condition
Privacy/robustness	Often increases complexity	Littlestone dim., explicit separation
Model expressivity	Controls approximation error	Network width, basis size, feature map
Data distribution	Affects statistical error	Sub-Gaussianity, boundedness
Optimization method	Affects convergence error	SGD steps, optimization error component
Problem structure	Order-optimality possible	Discrete setting, Gaussian fields, SMC

6. Algorithmic and Analytical Innovations

Modern sample complexity analysis benefits from estimator design, problem structure exploitation, and expectation-based aggregation to avoid overly pessimistic union bounds:

Sample minimum estimators leverage heavy-tails to sharpen bounds for frequency moments and entropy estimation (0910.1403).
Communication complexity reductions clarify privacy costs and link learning sample complexity to information-theoretic dimensions (Feldman et al., 2014).
Recursive and overlapping majority voting allows for the removal of logarithmic slack in supervised PAC learning (Hanneke, 2015).
Score estimation error decompositions decouple approximation, optimization, and statistical errors—even in absence of ERM access—thus decimating prior exponential dependencies (Gaur et al., 23 May 2025).
Clipping and contraction in discrete models ensure control over bounded function spaces, facilitating tight theoretical sample complexity (Srikanth et al., 12 Oct 2025).
Quantum shadow tomography and reduction to extreme points enable shrinking the effective hypothesis space and recovering classical logarithmic sample scalings in quantum learning (Heidari et al., 22 Aug 2024).
Coupling and block martingale small-ball arguments offer tight control over dependent data in dynamical system identification (Chatzikiriakos et al., 17 Sep 2024).

7. Practical Implications and Limitations

Sample complexity bounds serve as theoretical guarantees and practical design tools, dictating the minimal data requirements for reliable inference, guiding stopping rules, aiding neural architecture selection, and informing experimental design under constraints such as privacy or contamination.

Limitations may arise due to hidden constants, the tightness of approximations, or unmodeled dependencies. For very small ε or challenging regimes (e.g., high noise, small SNR), sample complexity can grow rapidly; even breakthrough algorithms have inherent asymptotic or instance-specific lower bounds dictated by information theory. In high-dimensional or continuous domains, exponential scaling remains a fundamental obstacle unless additional structure or randomness (e.g., Gaussian random fields, symmetry, sparsity) is exploited (Dong et al., 2023).

Contemporary analysis increasingly eschews idealized assumptions (e.g., access to exact empirical risk minimizers, strongly convexity everywhere), moving toward more realistic models that integrate finite-iteration SGD, function class misspecification, and robust error decompositions. This shift aligns theory more closely with practical training regimes in deep and generative modeling (Gaur et al., 23 May 2025, Srikanth et al., 12 Oct 2025).