Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
132 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Minimax Convergence Rates

Updated 30 June 2025
  • Minimax rate of convergence is the fundamental measure for the smallest maximal risk achievable uniformly over a model class.
  • It underpins key applications across nonparametric regression, density estimation, and manifold learning by defining optimal trade-offs between risk and data.
  • Methodologies like Fano’s lemma, kernel regression, and adaptive procedures are employed to both derive and attain these optimal convergence rates.

The minimax rate of convergence quantifies the optimal rate—uniform over a prescribed model class—at which estimation error decreases with increasing sample size, and serves as a statistical benchmark describing the fundamental difficulty of inference in a given problem setting. It is a central concept in modern statistics, information theory, and machine learning, providing a precise characterization of the attainable trade-off between risk and data in both classical and nonparametric problems.

1. Foundational Definition and Principle

In statistical estimation, the minimax rate of convergence refers to the smallest possible maximal risk—across all estimators and all elements of a function or model class—that can be achieved as the sample size grows, typically expressed as the leading order of the risk as a function of nn (sample size) for a specified metric or loss.

Formally, for a class of distributions or parameter values Θ\Theta, a loss function L(,)L(\cdot, \cdot), estimator θ^n\hat{\theta}_n, and risk R(θ^n,θ)=Eθ[L(θ^n,θ)]R(\hat{\theta}_n, \theta) = \mathbb{E}_\theta[L(\hat{\theta}_n, \theta)], the minimax rate is defined as: infθ^nsupθΘR(θ^n,θ)ϕ(n)\inf_{\hat{\theta}_n} \sup_{\theta \in \Theta} R(\hat{\theta}_n, \theta) \asymp \phi(n) where ϕ(n)\phi(n) expresses the order (in nn) of the smallest achievable maximal risk. The optimal estimator is said to be minimax rate-optimal if it attains this order.

2. Mathematical Formulation in Various Models

The minimax rate depends on the statistical model, complexity constraints (such as smoothness, sparsity, or dimension), observation mechanism, and on the loss function. Representative settings include:

  • Nonparametric Regression: For estimating a function ff in a Hölder or Sobolev class of smoothness β\beta on [0,1]d[0,1]^d, with i.i.d. observations (Xi,Yi)(X_i, Y_i), the minimax risk under L2L_2 loss is classically

Rnn2β/(2β+d)R_n^* \asymp n^{-2\beta/(2\beta + d)}

which is attained by local polynomial or kernel regression (Distributional regression and its evaluation with the CRPS: Bounds and convergence of the minimax risk, 2022).

  • Density Estimation: For densities in Sobolev class β\beta, minimax L2L_2-risk is n2β/(2β+d)n^{-2\beta/(2\beta + d)}, mirroring nonparametric regression.
  • Manifold Estimation: For estimating a dd-dimensional smooth manifold MRDM \subset \mathbb{R}^D in Hausdorff distance, the minimax rate is

n2/(2+d)n^{-2/(2 + d)}

and is independent of the ambient dimension DD under compact, perpendicular noise (Minimax Manifold Estimation, 2010). The risk is measured in Hausdorff or other set distances.

  • Inverse Problems (Kernel/Operator Learning): For estimation in a regression model with a compact linear operator (spectral regularization), the minimax rate depends on the decay of the operator's eigenvalues (polynomial, exponential), and the smoothness of the function to be recovered; see (Minimax rates for learning kernels in operators, 27 Feb 2025):

Polynomial decay:M2βr/(2βr+2r+1)Exponential decay:Mβ/(β+1)\text{Polynomial decay:} \quad M^{-2\beta r/(2\beta r + 2r + 1)} \qquad \text{Exponential decay:} \quad M^{-\beta/(\beta + 1)}

where rr is the decay exponent, β\beta smoothness, and MM the sample size.

  • Distributional Regression (CRPS): For conditional distributions FxF^*_x with Hölder-continuous xFxx \mapsto F^*_x, the excess risk in CRPS loss is

n2h/(2h+d)n^{-2h/(2h + d)}

where hh denotes Hölder regularity parameter (Distributional regression and its evaluation with the CRPS: Bounds and convergence of the minimax risk, 2022).

  • High-Dimensional Sparse Estimation: In linear regression with sparsity constraints, the minimax risk scales (for rr nonzero coefficients, pp variables, nn observations) as

σ2n[rlog(p/r)]\frac{\sigma^2}{n}\big[ r \log(p/r) \big]

and extensions exist for models with interactions (High-dimensional Adaptive Minimax Sparse Estimation with Interactions, 2018).

3. Key Examples and Regime Comparisons

Different models and structural or noise assumptions produce different minimax rates, as detailed below:

Problem Setting Model Complexity Minimax Rate Notable Features
Manifold estimation Dim dd n2/(2+d)n^{-2/(2+d)} Intrinsic, not ambient, dim
Nonparametric regression Smoothness β\beta, dim dd n2β/(2β+d)n^{-2\beta/(2\beta + d)} Curse of dimensionality
Sparse linear regression Support rr, ambient pp σ2n[rlog(p/r)]\frac{\sigma^2}{n}[r \log(p/r)] Information-theoretic (Fano) rate
Phase retrieval Set mean width mean widthN\sim \frac{\text{mean width}}{\sqrt{N}} Geometry-adaptive (Minimax rate of convergence and the performance of ERM in phase recovery, 2013)
Density estimation Sobolev β\beta n2β/(2β+d)n^{-2\beta/(2\beta + d)} Parallel to regression
Wasserstein barycenter n units, p per unit n1/2+p1/2n^{-1/2} + p^{-1/2} Phase-amplitude separation
Normal mixtures Smooth parameter set (logn)1/2n1(\log n)^{1/2} n^{-1} Slightly worse than parametric

The precise rate depends on both global and local aspects, such as data inhomogeneity (Nonparametric Regression Estimation Based on Spatially Inhomogeneous Data: Minimax Global Convergence Rates and Adaptivity, 2011), heavy-tailed or vanishingly small noise (Minimax Optimal rates of convergence in the shuffled regression, unlinked regression, and deconvolution under vanishing noise, 14 Apr 2024), ill-posedness (Minimax rates for learning kernels in operators, 27 Feb 2025), or adversarial conditions (Minimax rates of convergence for nonparametric regression under adversarial attacks, 12 Oct 2024, Adversarial learning for nonparametric regression: Minimax rate and adaptive estimation, 2 Jun 2025).

4. Methodological Principles for Achieving Minimax Rates

The maximization of minimax risk (the lower bound) and the construction of estimators achieving it (the upper bound) commonly rely on information-theoretic and empirical process techniques:

These strategies often require challenging control over empirical covariance (for tamed estimators), geometric entropy (packing numbers, mean width), or spectral norms (for inverse problems).

5. Impact of Problem Structure and Adversarial Setting

The minimax rate depends fundamentally on the interplay between the structure of the statistical model and the loss being considered:

6. Practical Implications and Applications

Knowledge of minimax rates has broad implications for statistical practice and methodological development:

  • Algorithm Benchmarking: Claims of improved performance beyond the minimax rate (for a specified model and loss) require justification based on additional structural or distributional assumptions.
  • Experimental Design: Minimax rates inform the required sample sizes to achieve a prespecified estimation accuracy, especially relevant for expensive data acquisition.
  • Model Selection and Adaptive Procedures: The minimax rate justifies the search for adaptive estimators capable of adjusting to unknown regularity or complexity, with many recent advances providing data-driven optimality guarantees.
  • Adversarial Robustness Evaluation: In robust machine learning and adversarial statistics, the minimax rate quantifies the fundamental price of robustness and guides principled construction of defense methods.
  • Computational Considerations: Achieving the minimax rate may not always be computationally feasible. A gap (statistical-computational gap) may exist between statistically optimal and computationally tractable estimators, motivating ongoing research.

7. Open Problems and Recent Directions

Despite substantial progress, several areas remain active:

  • Broader Noise and Data Models: Extending minimax results to non-compact, heavy-tailed, or dependent error models; assessing minimax rates under weaker or more realistic sampling regimes.
  • Adaptive and Computationally Efficient Estimation: Construction of practical algorithms achieving minimax rates, especially in high dimensions or complex models (e.g., manifold learning under general noise).
  • Dynamic and Sequential Settings: Robustness of minimax rates under streaming, sequential, or always-valid data acquisition (see (Minimax rates without the fixed sample size assumption, 2020)).
  • Interplay with Statistical-Computational Trade-offs: Characterization of when the minimax rate can be attained by efficient algorithms, and conditions under which approximation or relaxations are necessary.

Summary Table: Minimax Rates in Key Problems

Problem Minimax Rate Main Controlling Quantity
Nonparametric regression n2β/(2β+d)n^{-2\beta/(2\beta + d)} Smoothness β\beta, dimension dd
Manifold estimation (Hausdorff) n2/(2+d)n^{-2/(2+d)} Intrinsic dimension dd
Density estimation (Sobolev) n2β/(2β+d)n^{-2\beta/(2\beta + d)} Smoothness, dimension
Phase retrieval (T)/N\ell(T)/\sqrt{N} Mean width (T)\ell(T), complexity
Operator learning (poly spectrum) M2βr/(2βr+2r+1)M^{-2\beta r/(2\beta r+2r+1)} Smoothing β\beta, spectral decay rr
Operator learning (exp spectrum) Mβ/(β+1)M^{-\beta/(\beta+1)} Smoothing β\beta
Wasserstein barycenter estimation n1/2n^{-1/2} Number of "populations" nn
Adversarial regression rq(1β)+nqβ/(2β+d)r^{q(1 \wedge \beta)} + n^{-q\beta/(2\beta + d)} Perturbation rr, smoothness β\beta
kNN (classification, regression) Model-dependent (see (Minimax Rate Optimal Adaptive Nearest Neighbor Classification and Regression, 2019)) Distribution tail, local density

The minimax rate of convergence provides a unifying principle and analytical toolset for identifying the statistical limits of estimation and learning. It simultaneously guides the development of adaptive, robust, and computationally sound estimators, and demarcates the boundary between possible and impossible performance under specified modeling assumptions.