Minimax Convergence Rates
- Minimax Convergence Rates are metrics that quantify the optimal speed at which any estimator converges to the true parameter value over a class of models.
- They provide benchmarks for assessing the efficiency of statistical methods in nonparametric, high-dimensional, and privacy-constrained settings.
- Proof techniques such as Le Cam’s method, Fano’s inequality, and metric entropy are key to establishing tight minimax lower bounds.
A minimax convergence rate quantifies the fundamental speed at which any estimator can approach the true value of a statistical parameter, uniformly over a model class, when the loss is measured in expectation over worst-case data-generating distributions. The notion of minimax optimality is central to statistical decision theory and nonparametric inference, and it provides benchmarks for evaluating the efficiency of statistical procedures in both classical and modern high-dimensional or privacy-constrained regimes.
1. Formal Definition and General Framework
The minimax risk for a parameter class and loss function is defined by
where ranges over all estimators measurable with respect to the observed data points. The minimax convergence rate is the sequence such that (i.e., bounded above and below by constant multiples of for all large ). This rate encapsulates both the statistical complexity of the model class and the analytic properties of the loss function .
In more complex settings, such as those with additional privacy constraints, dependency structures, adversarial perturbations, or partial information, the minimax rate quantifies the exact effect of these features on achievable estimation or prediction accuracy.
2. Canonical Rates and Dependence on Model Complexity
In classical nonparametric estimation and regression, the minimax rate is determined by the interplay between the function class's "smoothness" and the sample size. For example:
- Gaussian mean, parametric models: for .
- Hölder class regression, sup-norm loss: for on , where is the smoothness parameter (Peng et al., 2024).
- Sobolev class, loss: for -smooth functions in dimensions (Zhao et al., 2023).
For estimation of discrete distributions with categories, under no privacy constraints, the minimax risk is in squared error; for multinomial estimation under -local differential privacy, the minimax risk is , with sharp constants (Duchi et al., 2013).
Minimax rates change in the following settings:
- Privacy constraints: Effective sample size is scaled by when enforcing -local differential privacy (Duchi et al., 2013).
- Spatial inhomogeneity: Global rates depend on the design density's vanishing at isolated points; for density near , minimax -risk scales as for Besov ball smoothness (Antoniadis et al., 2011).
- Supersmooth deconvolution: Estimation rates become logarithmic in :
for deconvolution with -supersmooth noise and loss (Wasserstein metric) (Dedecker et al., 2013).
3. Representative Results in Key Models
Discrete Probability Estimation under Local Privacy
Given privatized samples from an unknown -dimensional multinomial , with privacy parameter , the sharp minimax rate in squared error is
achievable by randomized response or Laplace-noise schemes with moment-matching debiasing (Duchi et al., 2013). For small , privacy reduces effective sample size from to .
Smooth Density Estimation under Local Privacy
For in a Sobolev class of smoothness on :
- Non-private (classical):
- -locally private: Thus privacy worsens the polynomial exponent, and minimax estimation becomes strictly slower unless (Duchi et al., 2013).
Nonparametric Regression with Inhomogeneous Design
If the design density has a zero of order at (i.e., near ), for smoothness index , the minimax global -risk over a Besov ball satisfies
with logarithmic rates for exponential zeros. Adaptive wavelet thresholding attains these rates up to log factors and reveals that spatial inhomogeneity and function homogeneity jointly determine difficulty (Antoniadis et al., 2011).
Functional Linear Regression in RKHS
In functional linear regression with RKHS-regularized coefficient and general eigenvalue decay, the minimax prediction risk is
and, for polynomial decay , (Lian, 2012).
4. Algorithmic Minimax Rates and Statistical-Computational Tradeoffs
For smooth minimax optimization with -smooth, strongly convex-concave , first-order methods can attain convergence in terms of the primal-dual gap—improved from by combining Mirror-Prox and Nesterov AGD (Thekumparampil et al., 2019). In nonconvex-concave settings, a proximal-point-based algorithm yields to first-order stationarity, sharpening the previous best-known rate.
5. Minimax Rates in Complex and Structured Models
Crowdsourced Binary Label Inference
In the Dawid–Skene one-coin model with workers and items, the minimax error in estimating the true label vector is exponentially small in the effective crowd ability: where , summarize collective ability (Gao et al., 2013). The projected EM algorithm attains these exponents, matching minimax lower bounds.
Reinforcement Learning/Off-Policy Evaluation
In OPE with function approximation, minimax optimal rates for marginal importance weight and Q-function estimation under completeness and realizability align with the critical local Rademacher complexity of the underlying function class: where scale as for finite VC-dimension and as for metric entropy exponent (Uehara et al., 2021). Doubly robust OPE estimators achieve the semiparametric efficiency bound.
6. Extensions: Robustness, Privacy, and Time-Adaptive Frameworks
Adversarial and Privacy Effects
When regression is exposed to adversarial input perturbations of size , the minimax sup-norm rate becomes the sum of the non-adversarial rate and the maximum variation induced by the perturbation: where (Peng et al., 2024). For classes, .
Under sample-size uncertainty ("time-robust" minimaxity), the adversarial minimax risk is inflated by at most a logarithmic or iterated logarithmic factor in relative to the classical rate, e.g., for Gaussian mean estimation (Kirichenko et al., 2020).
Statistical Models with Partial Derivative Observations
Derivative observations in ANOVA/RKHS models can reduce the effective interaction order and accelerate minimax rates. For covariates with observed first partial derivatives in a -way interaction model (order ), the minimax rate for function estimation matches that of a -way interaction model without derivatives: (Dai et al., 2017). For (all partials), the rate is parametric .
7. Proof Techniques and Key Lower Bound Constructions
The proofs of minimax lower rates in the cited literature predominantly employ:
- Le Cam's two-point and Fano's multiple-hypothesis testing: Reduction from estimation to testing over well-separated parameter packings.
- Information contraction (under privacy): KL divergence can be controlled under post-processing, significantly reducing effective sample size in private estimation (Duchi et al., 2013).
- Metric entropy and covering arguments: Complexity of function/density class is measured under relevant loss/geometric properties (Hausdorff, , sup-norm, Wasserstein), directly determining optimal separation size and rates (Genovese et al., 2010, Dedecker et al., 2013, Zhao et al., 2023).
- Specialized analysis for complex structures: E.g., dyadic dependence leads to pointwise minimax risk scaling in terms of instead of for agents—owing to shared agent dependence (Graham et al., 2020).
8. Impact and Implications Across Statistical Domains
Minimax convergence rates serve as fundamental performance limits and guide both the development of estimation algorithms and theoretical hypotheses about statistical hardness under realistic constraints (privacy, adversarial robustness, function complexity, nonstandard designs, partial observations). They also catalyze algorithmic work at the interface with convex optimization, empirical risk minimization in modern machine learning, and information theory.
The rates tabulated above enable practitioners to directly compare the impact of model complexity, regularity, privacy, and robustness constraints, and to benchmark the performance of practical estimators and learning algorithms. In modern domains—privacy-preserving data analysis, crowdsourced inference, time-adaptive decision making, RL/OPE, high-dimensional graphical modeling—the precise identification of minimax exponents and constants continues to play a central role in statistical methodology and the theory of learning.
References
- Local privacy and sharp minimax rates: (Duchi et al., 2013)
- Inhomogeneous regression and adaptivity: (Antoniadis et al., 2011)
- RKHS-based functional linear prediction: (Lian, 2012)
- Time-homogeneous SDE classification rates: (Mintsa, 27 Jan 2025, Mintsa, 27 Jan 2025)
- Wasserstein deconvolution rates: (Dedecker et al., 2013)
- Nonparametric location-scale regression: (Zhao et al., 2023)
- Minimax-optimal neural networks: (Ko et al., 2024)
- Estimation from crowdsourced labels: (Gao et al., 2013)
- Manifold estimation: (Genovese et al., 2010)
- Dyadic dependence models in regression: (Graham et al., 2020)
- Minimax convergence under adversarial attacks: (Peng et al., 2024)
- Minimax convergence for functional ANOVA with derivatives: (Dai et al., 2017)
- Minimax bounds for normal mixtures: (Kim, 2011)
- Minimax algorithms for smooth minimax optimization: (Thekumparampil et al., 2019)
- Time-adaptive minimax rates: (Kirichenko et al., 2020)