Papers
Topics
Authors
Recent
Search
2000 character limit reached

Minimax Convergence Rates

Updated 4 February 2026
  • Minimax Convergence Rates are metrics that quantify the optimal speed at which any estimator converges to the true parameter value over a class of models.
  • They provide benchmarks for assessing the efficiency of statistical methods in nonparametric, high-dimensional, and privacy-constrained settings.
  • Proof techniques such as Le Cam’s method, Fano’s inequality, and metric entropy are key to establishing tight minimax lower bounds.

A minimax convergence rate quantifies the fundamental speed at which any estimator can approach the true value of a statistical parameter, uniformly over a model class, when the loss is measured in expectation over worst-case data-generating distributions. The notion of minimax optimality is central to statistical decision theory and nonparametric inference, and it provides benchmarks for evaluating the efficiency of statistical procedures in both classical and modern high-dimensional or privacy-constrained regimes.

1. Formal Definition and General Framework

The minimax risk for a parameter class Θ\Theta and loss function L(θ^,θ)L(\hat\theta, \theta) is defined by

Rn=infθ^supθΘEθ[L(θ^n,θ)],R_n^* = \inf_{\hat\theta} \sup_{\theta \in \Theta} \mathbb{E}_{\theta}[L(\hat\theta_n, \theta)],

where θ^n\hat\theta_n ranges over all estimators measurable with respect to the observed nn data points. The minimax convergence rate is the sequence (rn)n1(r_n)_{n\geq 1} such that RnrnR_n^* \asymp r_n (i.e., bounded above and below by constant multiples of rnr_n for all large nn). This rate encapsulates both the statistical complexity of the model class Θ\Theta and the analytic properties of the loss function LL.

In more complex settings, such as those with additional privacy constraints, dependency structures, adversarial perturbations, or partial information, the minimax rate quantifies the exact effect of these features on achievable estimation or prediction accuracy.

2. Canonical Rates and Dependence on Model Complexity

In classical nonparametric estimation and regression, the minimax rate is determined by the interplay between the function class's "smoothness" and the sample size. For example:

  • Gaussian mean, parametric models: Rnn1R_n^* \asymp n^{-1} for Eμ^μ2\mathbb{E}| \hat\mu - \mu |^2.
  • Hölder class regression, sup-norm loss: Rn(n1logn)α/(2α+d)R_n^* \asymp (n^{-1} \log n)^{\alpha/(2\alpha+d)} for fCαf \in C^\alpha on [0,1]d[0,1]^d, where α\alpha is the smoothness parameter (Peng et al., 2024).
  • Sobolev class, L2L_2 loss: n2α/(2α+d)n^{-2\alpha/(2\alpha+d)} for α\alpha-smooth functions in dd dimensions (Zhao et al., 2023).

For estimation of discrete distributions with dd categories, under no privacy constraints, the minimax risk is O(d/n)O(d/n) in squared error; for multinomial estimation under ϵ\epsilon-local differential privacy, the minimax risk is O(d/(nϵ2))O(d/(n\epsilon^2)), with sharp constants (Duchi et al., 2013).

Minimax rates change in the following settings:

  • Privacy constraints: Effective sample size is scaled by ϵ2\epsilon^2 when enforcing ϵ\epsilon-local differential privacy (Duchi et al., 2013).
  • Spatial inhomogeneity: Global rates depend on the design density's vanishing at isolated points; for density g(x)xx0αg(x)\sim |x-x_0|^\alpha near x0x_0, minimax L2L^2-risk scales as n2s/(2s+α)n^{-2s'/(2s'+\alpha)} for Besov ball smoothness ss' (Antoniadis et al., 2011).
  • Supersmooth deconvolution: Estimation rates become logarithmic in nn:

Rn(logn)p/βR_n \asymp (\log n)^{-p/\beta}

for deconvolution with β\beta-supersmooth noise and loss WppW_p^p (Wasserstein metric) (Dedecker et al., 2013).

3. Representative Results in Key Models

Discrete Probability Estimation under Local Privacy

Given nn privatized samples from an unknown dd-dimensional multinomial pp, with privacy parameter ϵ1/4\epsilon \leq 1/4, the sharp minimax rate in squared 2\ell_2 error is

R(n,d,ϵ)min{1,dnϵ2}R(n, d, \epsilon) \asymp \min\Big\{1, \frac{d}{n\epsilon^2}\Big\}

achievable by randomized response or Laplace-noise schemes with moment-matching debiasing (Duchi et al., 2013). For small ϵ\epsilon, privacy reduces effective sample size from nn to nϵ2/dn\epsilon^2/d.

Smooth Density Estimation under Local Privacy

For ff in a Sobolev class of smoothness β>1/2\beta > 1/2 on [0,1][0,1]:

  • Non-private (classical): Rnn2β/(2β+1)R_n^* \asymp n^{-2\beta/(2\beta+1)}
  • ϵ\epsilon-locally private: Rn(nϵ2)2β/(2β+2)R_n^* \asymp (n\epsilon^2)^{-2\beta/(2\beta+2)} Thus privacy worsens the polynomial exponent, and minimax estimation becomes strictly slower unless ϵ\epsilon \to \infty (Duchi et al., 2013).

Nonparametric Regression with Inhomogeneous Design

If the design density gg has a zero of order α>0\alpha>0 at x0x_0 (i.e., g(x)xx0αg(x)\sim |x-x_0|^\alpha near x0x_0), for smoothness index ss', the minimax global L2L^2-risk over a Besov ball satisfies

RnCn2s/(2s+α)R_n \geq C n^{-2s'/(2s'+\alpha)}

with logarithmic rates for exponential zeros. Adaptive wavelet thresholding attains these rates up to log factors and reveals that spatial inhomogeneity and function homogeneity jointly determine difficulty (Antoniadis et al., 2011).

Functional Linear Regression in RKHS

In functional linear regression with RKHS-regularized coefficient and general eigenvalue decay, the minimax prediction risk is

Rnλ0,withΦ(λ0)λ0n1R_n^* \asymp \lambda_0, \quad \text{with} \quad \Phi(\lambda_0)\lambda_0 \asymp n^{-1}

and, for polynomial decay sjj2rs_j\asymp j^{-2r}, λ0n2r/(2r+1)\lambda_0 \asymp n^{-2r/(2r+1)} (Lian, 2012).

4. Algorithmic Minimax Rates and Statistical-Computational Tradeoffs

For smooth minimax optimization minxmaxyg(x,y)\min_x \max_y g(x, y) with LL-smooth, strongly convex-concave gg, first-order methods can attain O~(1/k2)\tilde O(1/k^2) convergence in terms of the primal-dual gap—improved from O(1/k)O(1/k) by combining Mirror-Prox and Nesterov AGD (Thekumparampil et al., 2019). In nonconvex-concave settings, a proximal-point-based algorithm yields O~(1/k1/3)\tilde O(1/k^{1/3}) to first-order stationarity, sharpening the previous O(1/k1/5)O(1/k^{1/5}) best-known rate.

5. Minimax Rates in Complex and Structured Models

Crowdsourced Binary Label Inference

In the Dawid–Skene one-coin model with nn workers and mm items, the minimax error in estimating the true label vector is exponentially small in the effective crowd ability: min{clustering error,label error}exp(12nmax{νˉ,D(μˉλ1μˉλ)})\min\Big\{\text{clustering error}, \text{label error}\Big\}\lesssim \exp\Big(-\frac12 n \max\{\bar\nu, D(\bar\mu_\lambda\|1-\bar\mu_\lambda)\}\Big) where νˉ\bar\nu, μˉ\bar\mu summarize collective ability (Gao et al., 2013). The projected EM algorithm attains these exponents, matching minimax lower bounds.

Reinforcement Learning/Off-Policy Evaluation

In OPE with function approximation, minimax optimal rates for marginal importance weight and Q-function estimation under completeness and realizability align with the critical local Rademacher complexity of the underlying function class: w^wπe2ηw,n,q^qπe2ηq,n\|\hat w - w_{\pi_e}\|_2 \lesssim \eta_{w,n}, \quad \|\hat q - q_{\pi_e}\|_2 \lesssim \eta_{q,n} where ηw,n,ηq,n\eta_{w,n}, \eta_{q,n} scale as n1/2n^{-1/2} for finite VC-dimension and as n1/(2+β)n^{-1/(2+\beta)} for metric entropy exponent β\beta (Uehara et al., 2021). Doubly robust OPE estimators achieve the semiparametric efficiency bound.

6. Extensions: Robustness, Privacy, and Time-Adaptive Frameworks

Adversarial and Privacy Effects

When regression is exposed to adversarial input perturbations of size ε\varepsilon, the minimax sup-norm rate becomes the sum of the non-adversarial rate and the maximum variation induced by the perturbation: Rn(ε)Rn(0)+wF(ε)R_n^*(\varepsilon) \asymp R_n^*(0) + w_\mathcal{F}(\varepsilon) where wF(ε)=supfFsupx,δεf(x+δ)f(x)w_\mathcal{F}(\varepsilon) = \sup_{f \in \mathcal F}\sup_{x,\|\delta\|_\infty \leq \varepsilon}|f(x+\delta) - f(x)| (Peng et al., 2024). For CαC^\alpha classes, wF(ε)Lεαw_\mathcal{F}(\varepsilon) \asymp L\varepsilon^\alpha.

Under sample-size uncertainty ("time-robust" minimaxity), the adversarial minimax risk is inflated by at most a logarithmic or iterated logarithmic factor in nn relative to the classical rate, e.g., n1lnlnnn^{-1}\ln\ln n for Gaussian mean estimation (Kirichenko et al., 2020).

Statistical Models with Partial Derivative Observations

Derivative observations in ANOVA/RKHS models can reduce the effective interaction order and accelerate minimax rates. For pp covariates with observed first partial derivatives in a dd-way interaction model (order r=dr=d), the minimax rate for function estimation matches that of a (dp)(d-p)-way interaction model without derivatives: Rn[n(logn)1+pd]2m/(2m+1)R_n \asymp [n (\log n)^{1+p-d}]^{-2m/(2m+1)} (Dai et al., 2017). For p=dp=d (all partials), the rate is parametric n1n^{-1}.

7. Proof Techniques and Key Lower Bound Constructions

The proofs of minimax lower rates in the cited literature predominantly employ:

  • Le Cam's two-point and Fano's multiple-hypothesis testing: Reduction from estimation to testing over well-separated parameter packings.
  • Information contraction (under privacy): KL divergence can be controlled under post-processing, significantly reducing effective sample size in private estimation (Duchi et al., 2013).
  • Metric entropy and covering arguments: Complexity of function/density class is measured under relevant loss/geometric properties (Hausdorff, L2L^2, sup-norm, Wasserstein), directly determining optimal separation size and rates (Genovese et al., 2010, Dedecker et al., 2013, Zhao et al., 2023).
  • Specialized analysis for complex structures: E.g., dyadic dependence leads to pointwise minimax risk scaling in terms of N2β/(2β+dX)N^{-2\beta/(2\beta+d_X)} instead of N2N^2 for NN agents—owing to shared agent dependence (Graham et al., 2020).

8. Impact and Implications Across Statistical Domains

Minimax convergence rates serve as fundamental performance limits and guide both the development of estimation algorithms and theoretical hypotheses about statistical hardness under realistic constraints (privacy, adversarial robustness, function complexity, nonstandard designs, partial observations). They also catalyze algorithmic work at the interface with convex optimization, empirical risk minimization in modern machine learning, and information theory.

The rates tabulated above enable practitioners to directly compare the impact of model complexity, regularity, privacy, and robustness constraints, and to benchmark the performance of practical estimators and learning algorithms. In modern domains—privacy-preserving data analysis, crowdsourced inference, time-adaptive decision making, RL/OPE, high-dimensional graphical modeling—the precise identification of minimax exponents and constants continues to play a central role in statistical methodology and the theory of learning.


References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Minimax Convergence Rates.