Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

95 tokens/sec

Gemini 2.5 Pro Premium

55 tokens/sec

GPT-5 Medium

20 tokens/sec

GPT-5 High Premium

20 tokens/sec

GPT-4o

98 tokens/sec

DeepSeek R1 via Azure Premium

86 tokens/sec

GPT OSS 120B via Groq Premium

463 tokens/sec

Kimi K2 via Groq Premium

200 tokens/sec

2000 character limit reached

Minimax Convergence Rates

Updated 30 June 2025

Minimax rate of convergence is the fundamental measure for the smallest maximal risk achievable uniformly over a model class.
It underpins key applications across nonparametric regression, density estimation, and manifold learning by defining optimal trade-offs between risk and data.
Methodologies like Fano’s lemma, kernel regression, and adaptive procedures are employed to both derive and attain these optimal convergence rates.

The minimax rate of convergence quantifies the optimal rate—uniform over a prescribed model class—at which estimation error decreases with increasing sample size, and serves as a statistical benchmark describing the fundamental difficulty of inference in a given problem setting. It is a central concept in modern statistics, information theory, and machine learning, providing a precise characterization of the attainable trade-off between risk and data in both classical and nonparametric problems.

1. Foundational Definition and Principle

In statistical estimation, the minimax rate of convergence refers to the smallest possible maximal risk—across all estimators and all elements of a function or model class—that can be achieved as the sample size grows, typically expressed as the leading order of the risk as a function of $n$ (sample size) for a specified metric or loss.

Formally, for a class of distributions or parameter values $\Theta$ , a loss function $L(\cdot, \cdot)$ , estimator $\hat{\theta}_n$ , and risk $R(\hat{\theta}_n, \theta) = \mathbb{E}_\theta[L(\hat{\theta}_n, \theta)]$ , the minimax rate is defined as: $\inf_{\hat{\theta}_n} \sup_{\theta \in \Theta} R(\hat{\theta}_n, \theta) \asymp \phi(n)$ where $\phi(n)$ expresses the order (in $n$ ) of the smallest achievable maximal risk. The optimal estimator is said to be minimax rate-optimal if it attains this order.

2. Mathematical Formulation in Various Models

The minimax rate depends on the statistical model, complexity constraints (such as smoothness, sparsity, or dimension), observation mechanism, and on the loss function. Representative settings include:

Nonparametric Regression: For estimating a function $f$ in a Hölder or Sobolev class of smoothness $\beta$ on $[0,1]^d$ , with i.i.d. observations $(X_i, Y_i)$ , the minimax risk under $L_2$ loss is classically

$R_n^* \asymp n^{-2\beta/(2\beta + d)}$

which is attained by local polynomial or kernel regression (Pic et al., 2022).

Density Estimation: For densities in Sobolev class $\beta$ , minimax $L_2$ -risk is $n^{-2\beta/(2\beta + d)}$ , mirroring nonparametric regression.
Manifold Estimation: For estimating a $d$ -dimensional smooth manifold $M \subset \mathbb{R}^D$ in Hausdorff distance, the minimax rate is

$n^{-2/(2 + d)}$

and is independent of the ambient dimension $D$ under compact, perpendicular noise (Genovese et al., 2010). The risk is measured in Hausdorff or other set distances.

Inverse Problems (Kernel/Operator Learning): For estimation in a regression model with a compact linear operator (spectral regularization), the minimax rate depends on the decay of the operator's eigenvalues (polynomial, exponential), and the smoothness of the function to be recovered; see (Zhang et al., 27 Feb 2025):

$\text{Polynomial decay:} \quad M^{-2\beta r/(2\beta r + 2r + 1)} \qquad \text{Exponential decay:} \quad M^{-\beta/(\beta + 1)}$

where $r$ is the decay exponent, $\beta$ smoothness, and $M$ the sample size.

Distributional Regression (CRPS): For conditional distributions $F^*_x$ with Hölder-continuous $x \mapsto F^*_x$ , the excess risk in CRPS loss is

$n^{-2h/(2h + d)}$

where $h$ denotes Hölder regularity parameter (Pic et al., 2022).

High-Dimensional Sparse Estimation: In linear regression with sparsity constraints, the minimax risk scales (for $r$ nonzero coefficients, $p$ variables, $n$ observations) as

$\frac{\sigma^2}{n}\big[ r \log(p/r) \big]$

and extensions exist for models with interactions (Ye et al., 2018).

3. Key Examples and Regime Comparisons

Different models and structural or noise assumptions produce different minimax rates, as detailed below:

Problem Setting	Model Complexity	Minimax Rate	Notable Features
Manifold estimation	Dim $d$	$n^{-2/(2+d)}$	Intrinsic, not ambient, dim
Nonparametric regression	Smoothness $\beta$ , dim $d$	$n^{-2\beta/(2\beta + d)}$	Curse of dimensionality
Sparse linear regression	Support $r$ , ambient $p$	$\frac{\sigma^2}{n}[r \log(p/r)]$	Information-theoretic (Fano) rate
Phase retrieval	Set mean width	$\sim \frac{\text{mean width}}{\sqrt{N}}$	Geometry-adaptive (Lecué et al., 2013)
Density estimation	Sobolev $\beta$	$n^{-2\beta/(2\beta + d)}$	Parallel to regression
Wasserstein barycenter	n units, p per unit	$n^{-1/2} + p^{-1/2}$	Phase-amplitude separation
Normal mixtures	Smooth parameter set	$(\log n)^{1/2} n^{-1}$	Slightly worse than parametric

The precise rate depends on both global and local aspects, such as data inhomogeneity (Antoniadis et al., 2011), heavy-tailed or vanishingly small noise (Durot et al., 14 Apr 2024), ill-posedness (Zhang et al., 27 Feb 2025), or adversarial conditions (Peng et al., 12 Oct 2024, Peng et al., 2 Jun 2025).

4. Methodological Principles for Achieving Minimax Rates

The maximization of minimax risk (the lower bound) and the construction of estimators achieving it (the upper bound) commonly rely on information-theoretic and empirical process techniques:

Lower Bound Construction: Approaches such as Fano’s lemma, Le Cam’s method, and Assouad’s lemma provide information-theoretic lower bounds by reducing estimation to multiple hypothesis testing between well-separated models, quantifying indistinguishability via total variation or Hellinger distances (Genovese et al., 2010, Wang et al., 2023, Kim, 2011).
Upper Bound Qualifying Estimators: Sieve estimators, projection estimators, kernel regression, local polynomial estimators, and tamed least squares (tLSE) estimators are all shown to achieve optimal or nearly optimal rates under the correct model and tuning (Zhang et al., 27 Feb 2025, Wang et al., 2023).
Adaptivity and Robustness: Procedures such as Lepski’s method, model selection (ABC criterion), or adaptive nearest neighbor/kNN rules yield estimators that achieve the minimax rate across a range of unknown smoothness or problem classes (Zhao et al., 2019, Ye et al., 2018, Peng et al., 2 Jun 2025).
Critical inequality/Empirical Process Theory: Fast rates in complex settings such as reinforcement learning or nonparametric estimation under function approximation are characterized using localized Rademacher complexities and critical inequalities (see (Uehara et al., 2021)).

These strategies often require challenging control over empirical covariance (for tamed estimators), geometric entropy (packing numbers, mean width), or spectral norms (for inverse problems).

5. Impact of Problem Structure and Adversarial Setting

The minimax rate depends fundamentally on the interplay between the structure of the statistical model and the loss being considered:

Intrinsic Dimension and Geometry: Estimation on a $d$ -dimensional manifold embedded in high-dimensional spaces achieves a rate determined by $d$ , not by the ambient dimension (Genovese et al., 2010).
Ill-posedness and Operator Spectrum: Inverse problems with polynomial or exponentially decaying operator spectra have qualitatively different rates; exponential decay leads to strictly slower rates controlled solely by smoothness (Zhang et al., 27 Feb 2025).
Adversarial Risk and Robustness: With adversarial (worst-case) input perturbations, the minimax risk equals the sum of the standard risk and a modulus term dictated by function class smoothness and perturbation magnitude (Peng et al., 12 Oct 2024, Peng et al., 2 Jun 2025).
Data Inhomogeneity and Missingness: Minimax global convergence rates for regression are sensitive to spatial inhomogeneities in the design distribution, with sharp transitions (“elbow effects”) depending on both the magnitude of sparsity/data loss and function homogeneity (Antoniadis et al., 2011).
Heavy-Tailed or Vanishing Noise: Estimation under small (vanishing) noise or heavy-tailed errors can induce phase transitions in minimax risk, and, in some settings (e.g., shuffled/unlinked regression and deconvolution), the minimax rate changes nontrivially as the error variance crosses a threshold (Durot et al., 14 Apr 2024).

6. Practical Implications and Applications

Knowledge of minimax rates has broad implications for statistical practice and methodological development:

Algorithm Benchmarking: Claims of improved performance beyond the minimax rate (for a specified model and loss) require justification based on additional structural or distributional assumptions.
Experimental Design: Minimax rates inform the required sample sizes to achieve a prespecified estimation accuracy, especially relevant for expensive data acquisition.
Model Selection and Adaptive Procedures: The minimax rate justifies the search for adaptive estimators capable of adjusting to unknown regularity or complexity, with many recent advances providing data-driven optimality guarantees.
Adversarial Robustness Evaluation: In robust machine learning and adversarial statistics, the minimax rate quantifies the fundamental price of robustness and guides principled construction of defense methods.
Computational Considerations: Achieving the minimax rate may not always be computationally feasible. A gap (statistical-computational gap) may exist between statistically optimal and computationally tractable estimators, motivating ongoing research.

7. Open Problems and Recent Directions

Despite substantial progress, several areas remain active:

Broader Noise and Data Models: Extending minimax results to non-compact, heavy-tailed, or dependent error models; assessing minimax rates under weaker or more realistic sampling regimes.
Adaptive and Computationally Efficient Estimation: Construction of practical algorithms achieving minimax rates, especially in high dimensions or complex models (e.g., manifold learning under general noise).
Dynamic and Sequential Settings: Robustness of minimax rates under streaming, sequential, or always-valid data acquisition (see (Kirichenko et al., 2020)).
Interplay with Statistical-Computational Trade-offs: Characterization of when the minimax rate can be attained by efficient algorithms, and conditions under which approximation or relaxations are necessary.

Summary Table: Minimax Rates in Key Problems

Problem	Minimax Rate	Main Controlling Quantity
Nonparametric regression	$n^{-2\beta/(2\beta + d)}$	Smoothness $\beta$ , dimension $d$
Manifold estimation (Hausdorff)	$n^{-2/(2+d)}$	Intrinsic dimension $d$
Density estimation (Sobolev)	$n^{-2\beta/(2\beta + d)}$	Smoothness, dimension
Phase retrieval	$\ell(T)/\sqrt{N}$	Mean width $\ell(T)$ , complexity
Operator learning (poly spectrum)	$M^{-2\beta r/(2\beta r+2r+1)}$	Smoothing $\beta$ , spectral decay $r$
Operator learning (exp spectrum)	$M^{-\beta/(\beta+1)}$	Smoothing $\beta$
Wasserstein barycenter estimation	$n^{-1/2}$	Number of "populations" $n$
Adversarial regression	$r^{q(1 \wedge \beta)} + n^{-q\beta/(2\beta + d)}$	Perturbation $r$ , smoothness $\beta$
kNN (classification, regression)	Model-dependent (see (Zhao et al., 2019))	Distribution tail, local density

The minimax rate of convergence provides a unifying principle and analytical toolset for identifying the statistical limits of estimation and learning. It simultaneously guides the development of adaptive, robust, and computationally sound estimators, and demarcates the boundary between possible and impossible performance under specified modeling assumptions.