Minimax-Optimal Rate of Convergence

Updated 22 January 2026

The minimax-optimal rate of convergence is the sharpest rate at which estimators and algorithms approximate true parameters under worst-case conditions.
This rate serves as a gold standard by matching lower bounds from methods like Le Cam's and Assouad's techniques with constructive upper bound procedures such as kernel estimators.
Its determination drives practical algorithm design and performance benchmarking in nonparametric statistics, machine learning, and optimization.

A minimax-optimal rate of convergence is the sharpest possible rate (in terms of sample size or computational steps) at which a statistical estimator or algorithm can approximate an underlying object (such as a function, parameter, set, or operator), in the worst-case scenario over a specified class of problems. This fundamental concept plays a central role in nonparametric statistics, machine learning, optimization, and information theory. It quantifies the inherent statistical or numerical complexity of a problem and serves as the gold standard for both lower bounds (impossibility results) and for benchmarking estimators, procedures, or algorithms.

1. Formal Definition and General Principles

Given a statistical or computational problem class $\mathcal{P}$ , sample size $n$ (or computational budget $T$ ), and a risk/loss metric $d(\hat\theta, \theta)$ , the minimax risk is defined as: $R_n^* = \inf_{\hat\theta} \sup_{P\in\mathcal{P}} \mathbb{E}_P[d(\hat\theta, \theta(P))]$ The minimax-optimal rate of convergence is a sequence $a_n\to0$ such that

$c_1 a_n \leq R_n^* \leq c_2 a_n$

(up to constants and sometimes log factors), for suitably large $n$ , and where $a_n$ is tight in the sense that for any $b_n\ll a_n$ , $n$ 0.

This rate is attained or nearly attained by a specific estimator or algorithm which is referred to as minimax-rate-optimal.

2. Techniques for Establishing Lower and Upper Minimax Rates

Lower Bounds

A lower bound is proved via

Le Cam's method: constructing two or more "hard-to-distinguish" hypotheses such that their risks are separated, yet the total variation or Hellinger divergence between the associated distributions is small.
Assouad's lemma: using a large, well-separated packing of the parameter space to bound the estimator risk below by the average pairwise risk.
Fano's inequality or Fano–Tsybakov method: relating the minimax risk to the mutual information or average Kullback–Leibler divergence among codewords/hypotheses.

Upper Bounds

Match the lower rate with a constructive procedure:

For nonparametric/functional estimation: regularized local polynomial estimators, kNN, sieve-MLE, penalized splines, kernel methods, or tamed-LS estimators.
For minimax optimization: carefully tuned first/second order methods with restarts or adaptive step sizes.
For functional ANOVA and shape constraints: spline-ANOVA or convex spline estimators with adaptivity to constraints.

Matching upper and lower bounds demonstrates sharp minimax optimality.

3. Representative Examples Across Modern Statistics and Machine Learning

The minimax-optimal rate depends on the geometry and smoothness of the parameter class and the noise model. Below is a selection of precise rates from canonical settings, statistically and algorithmically:

Problem	Model/Assumptions	Minimax Rate	arXiv ID
Manifold Estimation	$n$ 1-dim C² submanifold in $n$ 2, Hausdorff distance	$n$ 3	(Genovese et al., 2010)
KL Divergence Estimation (Cont.)	$n$ 4-Hölder densities, $n$ 5-dim	$n$ 6	(Zhao et al., 2020)
KL Divergence Estimation (Discrete)	Alphabet size $n$ 7, density-ratio bound $n$ 8	$n$ 9	(Bu et al., 2016)
Flow Matching (Gen. models)	Besov class, $T$ 0-distance	$T$ 1	(Fukumizu et al., 2024)
Distributional Regression (CRPS)	$T$ 2-Hölder class	$T$ 3	(Pic et al., 2022)
Adaptive kNN Classification/Regression	Bounded support/density margin	$T$ 4 (regression)	(Zhao et al., 2019)
Functional ANOVA with Derivatives	Sobolev order $T$ 5, interaction, $T$ 6 partials	$T$ 7 or $T$ 8	(Dai et al., 2017)
Convex-Concave Minimax Optimization	2nd order, smooth convex-concave	$T$ 9 iter.	(Jiang et al., 2024)
Nonlocal Kernel Learning	Sobolev/Hölder $d(\hat\theta, \theta)$ 0	$d(\hat\theta, \theta)$ 1	(Wang et al., 2023)
Multivariate Deviated Models	MLE, distinguishability cond.	$d(\hat\theta, \theta)$ 2 in deviation, $d(\hat\theta, \theta)$ 3 in param.	(Do et al., 2023)
Nonparametric Regression under Adversarial $d(\hat\theta, \theta)$ 4-Risk	Hölder $d(\hat\theta, \theta)$ 5, adversarial radius $d(\hat\theta, \theta)$ 6	$d(\hat\theta, \theta)$ 7	(Peng et al., 2 Jun 2025)
Shape-constrained Convex Estimation	Convex, Hölder $d(\hat\theta, \theta)$ 8	$d(\hat\theta, \theta)$ 9	(Lebair et al., 2013)
Mixture of Normals Density Estimation	Location mixtures, ISB loss	$R_n^* = \inf_{\hat\theta} \sup_{P\in\mathcal{P}} \mathbb{E}_P[d(\hat\theta, \theta(P))]$ 0	(Kim, 2011)
High-Dim Covariance Estimation w/ Missing Data	Bandable/sparse, MCR model	$R_n^* = \inf_{\hat\theta} \sup_{P\in\mathcal{P}} \mathbb{E}_P[d(\hat\theta, \theta(P))]$ 1	(Cai et al., 2016)

These rates reflect the geometric and functional structure of the problem—the key determinant of statistical complexity.

4. Structural Determinants and Universality of Minimax Rates

The intrinsic dimension ( $R_n^* = \inf_{\hat\theta} \sup_{P\in\mathcal{P}} \mathbb{E}_P[d(\hat\theta, \theta(P))]$ 2) rather than the ambient dimension typically controls the exponent (e.g., manifold estimation (Genovese et al., 2010), nonparametric regression).
Smoothness/regularity (Sobolev, Hölder, Besov indices) manifests in the rate exponent denominator.
Noise structure (e.g., adversarial, deconvolution, mixture) appears in additional rate terms, sometimes leading to phase transitions (e.g., shuffled regression (Durot et al., 2024)).
Constraints (e.g., convexity, monotonicity) may slow the rate only by log-factors (e.g., convex estimation (Lebair et al., 2013)).
In high-dimensional sparse models, minimax rates incorporate both sparsity level and ambient dimension (covariance (Cai et al., 2016), lₚ-mixed-norm MKL (Suzuki, 2011)).

The minimax rate is universal in the sense that it cannot be improved by any estimator or algorithm under the stated assumptions, up to possible polylogarithmic terms.

5. Impact on Algorithm Design and Evaluation

Minimax theory serves as both a lower bound for impossibility results and a constructive goal for algorithmic innovation.

In adaptive and robust statistics, establishing minimax-optimality of data-driven procedures (e.g., adaptive kNN, Lepski-type bandwidth selection) is nontrivial and ensures practical competitiveness (Peng et al., 2 Jun 2025 Zhao et al., 2019).
In minimax optimization, the rate prescribes the optimal convergence versus iteration complexity for first- and second-order methods, and drives the development of restarts, acceleration, and optimism (e.g., AG-OG (Li et al., 2022), adaptive second-order methods (Jiang et al., 2024)).
For estimation under adversarial uncertainty, minimax-optimal plug-in estimators are modular and guarantee best possible trade-offs between estimation error and robustness (Peng et al., 2024 Peng et al., 2 Jun 2025).
In high-dimensional settings, the minimax rate informs both estimator tuning (e.g., block size, threshold) and the necessary empirical process control for handling incomplete or structured data.

6. Extensions, Phase Transitions, and Open Problems

Phase transitions: Statistical rates frequently change regime as a function of key problem parameters, e.g., adversarial-attack magnitude (Peng et al., 2 Jun 2025), signal-to-noise ratio in shuffled regression (Durot et al., 2024).
Boundary Case Behavior: When the model regularity is at threshold (e.g., smoothness at 1/2), log-factors or even subpolynomial convergence terms arise.
Non-Euclidean, Structured, or Nonparametric Models: Generalizing to graphs, manifolds, shapes, or dependent data requires sophisticated entropy and hypothesis-packing analyses.
Adaptivity and Robustness: Simultaneously achieving minimax-optimal rates over collections of models (unknown smoothness, unknown attack size) often requires multi-scale or adaptive procedures (Peng et al., 2 Jun 2025).

Open problems include sharp minimax rates under composite or nonstandard noise, rates for high-dimensional inference under non-classical missingness, and the complexity of computing minimax-optimal estimators algorithmically.

7. References

Key references for foundational and contemporary results include:

(Genovese et al., 2010) Minimax Manifold Estimation
(Zhao et al., 2020) Minimax Optimal Estimation of KL Divergence for Continuous Distributions
(Bu et al., 2016) Estimation of KL Divergence: Optimal Minimax Rate
(Fukumizu et al., 2024) Flow matching achieves almost minimax optimal convergence
(Pic et al., 2022) Distributional regression and its evaluation with the CRPS: Bounds and convergence of the minimax risk
(Zhao et al., 2019) Minimax Rate Optimal Adaptive Nearest Neighbor Classification and Regression
(Dai et al., 2017) Minimax Optimal Rates of Estimation in Functional ANOVA Models with Derivatives
(Peng et al., 2024, Peng et al., 2 Jun 2025) Minimax rates for adversarial learning
(Lebair et al., 2013) Minimax Optimal Estimation of Convex Functions in the Supreme Norm
(Wang et al., 2023) Optimal minimax rate of learning nonlocal interaction kernels
(Do et al., 2023) Minimax Optimal Rate for Parameter Estimation in Multivariate Deviated Models
(Cai et al., 2016) Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data
(Fallah et al., 2020) An Optimal Multistage Stochastic Gradient Method for Minimax Problems

The minimax-optimal rate of convergence thus captures and crystallizes the fundamental limitations of statistical and algorithmic procedures for rich, high-complexity models.