Papers
Topics
Authors
Recent
2000 character limit reached

Minimax Rates in Statistical Estimation

Updated 6 January 2026
  • Minimax rates are optimal risk decay bounds in estimation, quantifying the fastest achievable error rates uniformly over model classes.
  • They balance bias and variance through intricate tradeoffs influenced by sample size, function smoothness, and feature dimensionality.
  • Applications span nonparametric regression, random forests, inverse problems, and community detection, guiding practical algorithm design.

Minimax rates characterize the fundamental limits for statistical estimation and learning within specified problem classes, quantifying the optimal decay of risk or error as a function of sample size, model parameters, and function class complexity. The minimax rate is the fastest (typically order-optimal) rate achievable by any estimator (or procedure), uniformly over a function or model class, making it central to statistical theory and the design of learning algorithms.

1. Formal Definition and General Principle

The minimax risk for a statistical estimation problem is defined as the infimum over all estimators of the maximum expected loss across the target class. For a parameter space Θ\Theta and loss \ell, this is formalized as: Rn=infθ^supθΘEθ[(θ^,θ)]R_n^* = \inf_{\hat{\theta}} \sup_{\theta \in \Theta} \mathbb E_\theta[\ell(\hat{\theta}, \theta)] For function estimation (e.g., regression or density estimation), the risk may be expressed as squared error, L2L_2 norm, L1L_1 norm, or other metrics depending on the application. The minimax rate is the asymptotic order of RnR_n^* as nn \to \infty. The goal is to characterize RnR_n^*, often in terms of intrinsic complexity measures—such as metric entropy, covering numbers, smoothness parameters, and dimensionality.

2. Metric Entropy and the "Le Cam Equation"

Minimax rates in nonparametric estimation are fundamentally determined by the metric entropy structure of the function or model class. The prototypical characterization is via localized metric entropy H(ε)H(\varepsilon) (often in L2L_2 or L1L_1), leading to: H(εn)nεn2H(\varepsilon_n) \approx n \varepsilon_n^2 The minimax risk thus scales as %%%%10%%%%, where εn\varepsilon_n solves the balance equation between sample size and local function class complexity (Shrotriya et al., 2022). This principle applies broadly to regression, density estimation, nonparametric location-scale models, and convex density classes.

Examples of entropy-driven rates:

Class Covering Entropy H(ε)H(\varepsilon) Minimax Rate RnR_n^*
Hölder-β\beta densities ε1/β\varepsilon^{-1/\beta} nβ/(2β+1)n^{-\beta/(2\beta+1)}
TV-bounded densities V/εV /\varepsilon (V/n)1/3(V/n)^{1/3}
Convex kk-mixture simplex klog(1/ε)k \log(1/\varepsilon) (klogn)/n\sqrt{(k \log n)/n}

3. Classical Minimax Rates in Nonparametric and High-Dimensional Problems

For nonparametric regression on Hölder or Sobolev classes, the optimal rate for mean-squared error is: Rnn2β/(2β+d)R_n^* \asymp n^{-2\beta/(2\beta+d)} where β\beta is the smoothness parameter and dd is dimensionality (O'Reilly et al., 2021, Mourtada et al., 2018, Zhao et al., 2023). In sup-norm, a logarithmic factor appears: Rn(lognn)β/(2β+d)R_n^* \asymp \left( \frac{\log n}{n} \right)^{\beta/(2\beta + d)} In inverse problems, rates reflect both smoothness and operator ill-posedness; for Sobolev-type ellipsoid source sets with singular value decay sjjαs_j \asymp j^{-\alpha},

Rnn2β/(2α+2β+1)R_n^* \asymp n^{-2\beta/(2\alpha+2\beta+1)}

(Ding et al., 2017). In cost-sensitive and margin-sensitive classification on manifolds,

Rnnβ(1+α)/(2β+d)R_n^* \asymp n^{-\beta(1+\alpha)/(2\beta + d)}

where α\alpha is the margin exponent, and dd is the intrinsic dimension (Reeve et al., 2018).

For sparse high-dimensional models (e.g., estimation in the Gaussian sequence model),

Rn(linear functional)σ2s2log(1+d/s2)R_n^*(\text{linear functional}) \asymp \sigma^2 s^2 \log(1 + d/s^2)

with a phase transition ("elbow") at sds \sim \sqrt{d} separating sparse and dense regimes (Collier et al., 2015).

4. Structural Bias–Variance Decomposition and Geometry

Optimal estimation procedures balance geometric bias and variance, as formalized via random geometric partition statistics (e.g., diameters of tessellation cells, number of partition elements) (O'Reilly et al., 2021, Mourtada et al., 2018). For piecewise-constant estimators (histograms, forests), bias scales with average cell diameter (controlled by partition complexity), while variance is driven by sample allocation among cells.

In random forests built via stochastic tessellations (STIT, Poisson–hyperplane, Mondrian), optimal rates arise from balancing bias O(λ2β)O(\lambda^{-2\beta}) and variance O(λd/n)O(\lambda^d/n), yielding the tradeoff λn1/(d+2β)\lambda \asymp n^{1/(d+2\beta)} (O'Reilly et al., 2021). Self-consistency and stationarity ensure that tessellation statistics scale appropriately under geometric homogeneity.

5. Minimax Rates in Random Forests: Axis-Aligned vs. Oblique Splits

Originally, minimax rates for forests were established only for axis-aligned Mondrian forests. Recent advances (O'Reilly et al., 2021) prove that fully oblique random tessellation forests (with arbitrary directional distributions φ\varphi) achieve identical minimax rates in arbitrary dimension due to the invariance of typical cell geometry and critical bias‐variance balancing. Specifically,

  • For fC0,βf \in C^{0,\beta} (Hölder smoothness), Rn=O(n2β/(d+2β))R_n^* = O(n^{-2\beta/(d+2\beta)}).
  • For fC1,βf \in C^{1,\beta} (one extra derivative), rate improves to O(n2(1+β)/(d+2+2β))O(n^{-2(1+\beta)/(d+2+2\beta)}) given sufficient averaging. These results demonstrate that oblique splits, favored empirically, retain the full minimax optimality of axis-aligned variants.

6. Extensions: Robustness, Adversarial Regimes, and Functional Estimation

Minimax theory extends to robust estimation under adversarial perturbations. For nonparametric regression subject to adversarial input attacks, the minimax rate is the sum of the standard estimation rate and the adversarial function deviation under the perturbation set: RnRnstd+supfFsupx,δΔnf(x+δ)f(x)R_n^* \asymp R_n^{\rm std} + \sup_{f \in \mathcal F} \sup_{x,\delta \in \Delta_n} |f(x + \delta) - f(x)|\, with procedures such as the adversarial plug-in attaining this bound (Peng et al., 2024).

In regression under heavy-tailed, heteroskedastic, or non-Gaussian errors, minimax rates are determined by packing entropy of the regression function class, independent of the error law (subject to mild Hellinger differentiability conditions) (Zhao et al., 2023).

For functional estimation, minimax rates may display elbows or interpolation phenomena. In heterogeneous causal effect estimation, the optimal rate is dictated by the combined smoothness of nuisance and target functions, leading to a split between regression-like and functional-like rates: Rnn1/(2+d/γ)R_n^* \asymp n^{-1/(2 + d/\gamma)} for sufficient smoothness, with a slower rate n1/(1+d/(2γ)+d/(4s))n^{-1/(1 + d/(2\gamma) + d/(4s))} otherwise (Kennedy et al., 2022).

7. Network Analysis, Community Detection, and Testing

In network estimation problems (community detection, graphon estimation), minimax rates may be exponential rather than polynomial. In the Stochastic Block Model, the minimax misclassification error decays as: Rnexp(nI2),I=2log(pq+(1p)(1q))R_n^* \asymp \exp\left(- \frac{n I}{2} \right), \qquad I = -2\log(\sqrt{pq} + \sqrt{(1-p)(1-q)}) highlighting a threshold phenomenon for strong vs. weak consistency (Zhang et al., 2015, Gao et al., 2018). Robust recovery under adversarial node corruptions preserves these rates up to additive error terms (Liu et al., 2022).

Similarly, for high-dimensional changepoint detection, minimax testing rates exhibit phase transitions between sparse and dense regimes, with explicit dependence on dimensionality, sparsity, and sample size, and unusual triple-logarithmic factors in certain regimes (Liu et al., 2019).

8. Time-Robust Minimax Rates and Sample Size Adaptivity

Classical minimax rates assume a fixed sample size. Time-robust minimax rates generalize to settings with uncertain or data-dependent sample size (anytime-valid estimation). In most problems, the time-robust rate differs from the classical rate by at most a logarithmic (or iterated-logarithmic) factor, e.g.,

Rnf(n)lognR_n^* \preceq f(n) \log n

or for regular exponential families, Rn(loglogn)/nR_n^* \asymp (\log\log n)/n (Kirichenko et al., 2020). In model selection, time-robust rates enable simultaneous consistency and rate optimality, circumventing classical tradeoffs (AIC–BIC dilemma).

9. Practical Algorithmic Attainment and Adaptive Procedures

Rate-optimal estimators are often constructed by balancing geometric or combinatorial complexities:

  • Sieve MLEs and multistage aggregation schemes adaptively achieve minimax rates across a range of function classes (Shrotriya et al., 2022).
  • Random forests (Mondrian, STIT, Poisson–hyperplane) attain minimax rates via proper tuning of partition complexity and ensemble size (O'Reilly et al., 2021, Mourtada et al., 2018).
  • Adaptive procedures (e.g., model aggregation, convex hulls) select near-oracle complexity in practice without prior smoothness knowledge.

10. Summary Table: Prototypical Minimax Rates

Setting Rate RnR_n^* Reference
Hölder regression n2β/(d+2β)n^{-2\beta/(d+2\beta)} (O'Reilly et al., 2021)
Location-scale regression n2α/(2α+d)n^{-2\alpha/(2\alpha + d)} (Zhao et al., 2023)
Convex density class εn\varepsilon_n~[Le Cam Eqn] (Shrotriya et al., 2022)
Sparse vector estimation σ2s2log(1+d/s2)\sigma^2 s^2 \log(1+d/s^2) (Collier et al., 2015)
Graphon estimation n2α/(α+1)n^{-2\alpha/(\alpha+1)} (Gao et al., 2018)
SBM community detection exp(nI/2)\exp\left(-nI/2\right) (Zhang et al., 2015)
Adversarial regression std. rate ++ max deviation (Peng et al., 2024)
Inverse problems n2β/(2α+2β+1)n^{-2\beta/(2\alpha+2\beta+1)} (Ding et al., 2017)
Causal effect estimation nr1n^{-r_1}, r1r_1 elbow, see text (Kennedy et al., 2022)

References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Minimax Rates.