Minimax Convergence Rates

Updated 4 February 2026

Minimax Convergence Rates are metrics that quantify the optimal speed at which any estimator converges to the true parameter value over a class of models.
They provide benchmarks for assessing the efficiency of statistical methods in nonparametric, high-dimensional, and privacy-constrained settings.
Proof techniques such as Le Cam’s method, Fano’s inequality, and metric entropy are key to establishing tight minimax lower bounds.

A minimax convergence rate quantifies the fundamental speed at which any estimator can approach the true value of a statistical parameter, uniformly over a model class, when the loss is measured in expectation over worst-case data-generating distributions. The notion of minimax optimality is central to statistical decision theory and nonparametric inference, and it provides benchmarks for evaluating the efficiency of statistical procedures in both classical and modern high-dimensional or privacy-constrained regimes.

1. Formal Definition and General Framework

The minimax risk for a parameter class $\Theta$ and loss function $L(\hat\theta, \theta)$ is defined by

$R_n^* = \inf_{\hat\theta} \sup_{\theta \in \Theta} \mathbb{E}_{\theta}[L(\hat\theta_n, \theta)],$

where $\hat\theta_n$ ranges over all estimators measurable with respect to the observed $n$ data points. The minimax convergence rate is the sequence $(r_n)_{n\geq 1}$ such that $R_n^* \asymp r_n$ (i.e., bounded above and below by constant multiples of $r_n$ for all large $n$ ). This rate encapsulates both the statistical complexity of the model class $\Theta$ and the analytic properties of the loss function $L$ .

In more complex settings, such as those with additional privacy constraints, dependency structures, adversarial perturbations, or partial information, the minimax rate quantifies the exact effect of these features on achievable estimation or prediction accuracy.

2. Canonical Rates and Dependence on Model Complexity

In classical nonparametric estimation and regression, the minimax rate is determined by the interplay between the function class's "smoothness" and the sample size. For example:

Gaussian mean, parametric models: $R_n^* \asymp n^{-1}$ for $\mathbb{E}| \hat\mu - \mu |^2$ .
Hölder class regression, sup-norm loss: $R_n^* \asymp (n^{-1} \log n)^{\alpha/(2\alpha+d)}$ for $f \in C^\alpha$ on $[0,1]^d$ , where $\alpha$ is the smoothness parameter (Peng et al., 2024).
Sobolev class, $L_2$ loss: $n^{-2\alpha/(2\alpha+d)}$ for $\alpha$ -smooth functions in $d$ dimensions (Zhao et al., 2023).

For estimation of discrete distributions with $d$ categories, under no privacy constraints, the minimax risk is $O(d/n)$ in squared error; for multinomial estimation under $\epsilon$ -local differential privacy, the minimax risk is $O(d/(n\epsilon^2))$ , with sharp constants (Duchi et al., 2013).

Minimax rates change in the following settings:

Privacy constraints: Effective sample size is scaled by $\epsilon^2$ when enforcing $\epsilon$ -local differential privacy (Duchi et al., 2013).
Spatial inhomogeneity: Global rates depend on the design density's vanishing at isolated points; for density $g(x)\sim |x-x_0|^\alpha$ near $x_0$ , minimax $L^2$ -risk scales as $n^{-2s'/(2s'+\alpha)}$ for Besov ball smoothness $s'$ (Antoniadis et al., 2011).
Supersmooth deconvolution: Estimation rates become logarithmic in $n$ :

$R_n \asymp (\log n)^{-p/\beta}$

for deconvolution with $\beta$ -supersmooth noise and loss $W_p^p$ (Wasserstein metric) (Dedecker et al., 2013).

3. Representative Results in Key Models

Discrete Probability Estimation under Local Privacy

Given $n$ privatized samples from an unknown $d$ -dimensional multinomial $p$ , with privacy parameter $\epsilon \leq 1/4$ , the sharp minimax rate in squared $\ell_2$ error is

$R(n, d, \epsilon) \asymp \min\Big\{1, \frac{d}{n\epsilon^2}\Big\}$

achievable by randomized response or Laplace-noise schemes with moment-matching debiasing (Duchi et al., 2013). For small $\epsilon$ , privacy reduces effective sample size from $n$ to $n\epsilon^2/d$ .

Smooth Density Estimation under Local Privacy

For $f$ in a Sobolev class of smoothness $\beta > 1/2$ on $[0,1]$ :

Non-private (classical): $R_n^* \asymp n^{-2\beta/(2\beta+1)}$
$\epsilon$ -locally private: $R_n^* \asymp (n\epsilon^2)^{-2\beta/(2\beta+2)}$ Thus privacy worsens the polynomial exponent, and minimax estimation becomes strictly slower unless $\epsilon \to \infty$ (Duchi et al., 2013).

Nonparametric Regression with Inhomogeneous Design

If the design density $g$ has a zero of order $\alpha>0$ at $x_0$ (i.e., $g(x)\sim |x-x_0|^\alpha$ near $x_0$ ), for smoothness index $s'$ , the minimax global $L^2$ -risk over a Besov ball satisfies

$R_n \geq C n^{-2s'/(2s'+\alpha)}$

with logarithmic rates for exponential zeros. Adaptive wavelet thresholding attains these rates up to log factors and reveals that spatial inhomogeneity and function homogeneity jointly determine difficulty (Antoniadis et al., 2011).

Functional Linear Regression in RKHS

In functional linear regression with RKHS-regularized coefficient and general eigenvalue decay, the minimax prediction risk is

$R_n^* \asymp \lambda_0, \quad \text{with} \quad \Phi(\lambda_0)\lambda_0 \asymp n^{-1}$

and, for polynomial decay $s_j\asymp j^{-2r}$ , $\lambda_0 \asymp n^{-2r/(2r+1)}$ (Lian, 2012).

4. Algorithmic Minimax Rates and Statistical-Computational Tradeoffs

For smooth minimax optimization $\min_x \max_y g(x, y)$ with $L$ -smooth, strongly convex-concave $g$ , first-order methods can attain $\tilde O(1/k^2)$ convergence in terms of the primal-dual gap—improved from $O(1/k)$ by combining Mirror-Prox and Nesterov AGD (Thekumparampil et al., 2019). In nonconvex-concave settings, a proximal-point-based algorithm yields $\tilde O(1/k^{1/3})$ to first-order stationarity, sharpening the previous $O(1/k^{1/5})$ best-known rate.

5. Minimax Rates in Complex and Structured Models

Crowdsourced Binary Label Inference

In the Dawid–Skene one-coin model with $n$ workers and $m$ items, the minimax error in estimating the true label vector is exponentially small in the effective crowd ability: $\min\Big\{\text{clustering error}, \text{label error}\Big\}\lesssim \exp\Big(-\frac12 n \max\{\bar\nu, D(\bar\mu_\lambda\|1-\bar\mu_\lambda)\}\Big)$ where $\bar\nu$ , $\bar\mu$ summarize collective ability (Gao et al., 2013). The projected EM algorithm attains these exponents, matching minimax lower bounds.

Reinforcement Learning/Off-Policy Evaluation

In OPE with function approximation, minimax optimal rates for marginal importance weight and Q-function estimation under completeness and realizability align with the critical local Rademacher complexity of the underlying function class: $\|\hat w - w_{\pi_e}\|_2 \lesssim \eta_{w,n}, \quad \|\hat q - q_{\pi_e}\|_2 \lesssim \eta_{q,n}$ where $\eta_{w,n}, \eta_{q,n}$ scale as $n^{-1/2}$ for finite VC-dimension and as $n^{-1/(2+\beta)}$ for metric entropy exponent $\beta$ (Uehara et al., 2021). Doubly robust OPE estimators achieve the semiparametric efficiency bound.

6. Extensions: Robustness, Privacy, and Time-Adaptive Frameworks

Adversarial and Privacy Effects

When regression is exposed to adversarial input perturbations of size $\varepsilon$ , the minimax sup-norm rate becomes the sum of the non-adversarial rate and the maximum variation induced by the perturbation: $R_n^*(\varepsilon) \asymp R_n^*(0) + w_\mathcal{F}(\varepsilon)$ where $w_\mathcal{F}(\varepsilon) = \sup_{f \in \mathcal F}\sup_{x,\|\delta\|_\infty \leq \varepsilon}|f(x+\delta) - f(x)|$ (Peng et al., 2024). For $C^\alpha$ classes, $w_\mathcal{F}(\varepsilon) \asymp L\varepsilon^\alpha$ .

Under sample-size uncertainty ("time-robust" minimaxity), the adversarial minimax risk is inflated by at most a logarithmic or iterated logarithmic factor in $n$ relative to the classical rate, e.g., $n^{-1}\ln\ln n$ for Gaussian mean estimation (Kirichenko et al., 2020).

Statistical Models with Partial Derivative Observations

Derivative observations in ANOVA/RKHS models can reduce the effective interaction order and accelerate minimax rates. For $p$ covariates with observed first partial derivatives in a $d$ -way interaction model (order $r=d$ ), the minimax rate for function estimation matches that of a $(d-p)$ -way interaction model without derivatives: $R_n \asymp [n (\log n)^{1+p-d}]^{-2m/(2m+1)}$ (Dai et al., 2017). For $p=d$ (all partials), the rate is parametric $n^{-1}$ .

7. Proof Techniques and Key Lower Bound Constructions

The proofs of minimax lower rates in the cited literature predominantly employ:

Le Cam's two-point and Fano's multiple-hypothesis testing: Reduction from estimation to testing over well-separated parameter packings.
Information contraction (under privacy): KL divergence can be controlled under post-processing, significantly reducing effective sample size in private estimation (Duchi et al., 2013).
Metric entropy and covering arguments: Complexity of function/density class is measured under relevant loss/geometric properties (Hausdorff, $L^2$ , sup-norm, Wasserstein), directly determining optimal separation size and rates (Genovese et al., 2010, Dedecker et al., 2013, Zhao et al., 2023).
Specialized analysis for complex structures: E.g., dyadic dependence leads to pointwise minimax risk scaling in terms of $N^{-2\beta/(2\beta+d_X)}$ instead of $N^2$ for $N$ agents—owing to shared agent dependence (Graham et al., 2020).

8. Impact and Implications Across Statistical Domains

Minimax convergence rates serve as fundamental performance limits and guide both the development of estimation algorithms and theoretical hypotheses about statistical hardness under realistic constraints (privacy, adversarial robustness, function complexity, nonstandard designs, partial observations). They also catalyze algorithmic work at the interface with convex optimization, empirical risk minimization in modern machine learning, and information theory.

The rates tabulated above enable practitioners to directly compare the impact of model complexity, regularity, privacy, and robustness constraints, and to benchmark the performance of practical estimators and learning algorithms. In modern domains—privacy-preserving data analysis, crowdsourced inference, time-adaptive decision making, RL/OPE, high-dimensional graphical modeling—the precise identification of minimax exponents and constants continues to play a central role in statistical methodology and the theory of learning.

References

Local privacy and sharp minimax rates: (Duchi et al., 2013)
Inhomogeneous regression and adaptivity: (Antoniadis et al., 2011)
RKHS-based functional linear prediction: (Lian, 2012)
Time-homogeneous SDE classification rates: (Mintsa, 27 Jan 2025, Mintsa, 27 Jan 2025)
Wasserstein deconvolution rates: (Dedecker et al., 2013)
Nonparametric location-scale regression: (Zhao et al., 2023)
Minimax-optimal neural networks: (Ko et al., 2024)
Estimation from crowdsourced labels: (Gao et al., 2013)
Manifold estimation: (Genovese et al., 2010)
Dyadic dependence models in regression: (Graham et al., 2020)
Minimax convergence under adversarial attacks: (Peng et al., 2024)
Minimax convergence for functional ANOVA with derivatives: (Dai et al., 2017)
Minimax bounds for normal mixtures: (Kim, 2011)
Minimax algorithms for smooth minimax optimization: (Thekumparampil et al., 2019)
Time-adaptive minimax rates: (Kirichenko et al., 2020)