Berry–Esseen Theorem Overview

Updated 11 March 2026

The Berry–Esseen theorem is a foundational result in probability that quantifies the convergence rate of the central limit theorem by comparing normalized sums to the Gaussian distribution.
It employs Fourier-analytic techniques and smoothing inequalities with explicit moment conditions to derive optimal bounds and sharp constants in various settings.
Extensions include multivariate, dependent, and functional cases, with recent research achieving faster convergence rates under enhanced moment and density assumptions.

The Berry–Esseen theorem is a fundamental result in probability theory providing explicit, non-asymptotic quantitative bounds for the convergence rate in the central limit theorem (CLT). Specifically, it quantifies, in terms of explicit moment conditions, the uniform difference between the distribution function of a normalized sum of independent random variables and the limiting Gaussian distribution. The theorem and its variants underpin precise assessments of normal approximation for sums of independent and, in many generalizations, dependent random variables and statistics. This article presents rigorous statements, methods of proof, optimality issues, and connections to higher-dimensional, functional, and specialized probabilistic regimes as established in recent research literature.

1. Classical Statement and Sharp Constants

Let $X_1, X_2, \dots, X_n$ be independent, identically distributed real random variables with zero mean, variance $\sigma^2 > 0$ , and finite third absolute moment $\rho^3 = \mathbb{E}[|X_i|^3]$ . Define the normalized sum $S_n = \sum_{i=1}^n X_i$ and its distribution function $F_n(x) = \mathbb{P}(S_n / (\sigma\sqrt{n}) \le x)$ . Let $\Phi(x)$ denote the standard normal distribution function. The classical Berry–Esseen theorem asserts the existence of a universal constant $C$ such that

$\sup_{x \in \mathbb{R}} |F_n(x) - \Phi(x)| \le C\,\frac{\mathbb{E}[|X_1|^3]}{\sigma^3 \sqrt{n}}.$

Refined analysis yields $C \le 0.4748$ (Esseen), with the sharpest constant in the i.i.d. case known to be $C\le0.4097$ (Shevtsova, 2011) (Vershynin, 5 Feb 2026). For independent, non-identically distributed variables, the optimal constant is approximately $0.5600$.

2. Fourier-Analytic Techniques and Esseen's Smoothing Inequality

Central to most proofs is Esseen’s smoothing inequality, which relates the Kolmogorov distance between distribution functions to the difference of their characteristic functions. For real random variables $X, Y$ , and $Y$ possessing a bounded density ( $M$ ), for any $T > 0$ ,

$\sup_{a \in \mathbb{R}} |F_X(a) - F_Y(a)| \le \frac{2}{\pi} \int_{-T}^T \left| \frac{\phi_X(t) - \phi_Y(t)}{t} \right|\,dt + C M T^{-1}.$

The proof proceeds via smoothing the indicator function with Schwartz-class mollifiers and careful control of the Fourier-analytic tail (Vershynin, 5 Feb 2026). Taylor expansion of the log-characteristic function facilitates tight local approximations, with large- $t$ errors addressed via uniform characteristic function bounds.

3. Multivariate Extensions and Explicit Constants

For independent mean-zero $\mathbb{R}^d$ -valued variables $X_1,\ldots,X_n$ with $\operatorname{Cov}(X_i) = I_d$ , let $S_n = \sum_{i=1}^n X_i$ , $Z \sim N(0, I_d)$ , and define the convex-set Kolmogorov distance

$\Delta_n = \sup_{A \subset \mathbb{R}^d,\,\mathrm{convex}} \left| \mathbb{P}\{S_n \in A\} - \mathbb{P}\{Z \in A\} \right|.$

Rač (Raič, 2018) established the explicit bound

$\Delta_n \le (42\,d^{1/4} + 16) \sum_{i=1}^n \mathbb{E}[\|X_i\|^3],$

and, equivalently, via the Gaussian perimeter,

$\Delta_n \le \max\{27,\,1 + 50 Y_d\} \sum_{i=1}^n \mathbb{E}[\|X_i\|^3], \quad Y_d < 0.59 d^{1/4} + 0.21.$

The proof employs a variant of Stein's method, smoothing convex set indicators and tightly bounding the Gaussian perimeter using explicit constants. The $d^{1/4}$ scaling in the dimension is optimal (see Nazarov's asymptotics), but whether the numerical coefficient $42$ is improvable remains open (Raič, 2018).

4. Fast Rates Under Regularity and Minimal Density Assumptions

Substantial recent progress demonstrates that the canonical Berry–Esseen $O(n^{-1/2})$ rate is not universal. If the summands have additional moment-matching with the normal law (up to order $k\geq 3$ ) and the distribution possesses a small “rectangle” where its density is lower bounded by $h$ over a width $w$ , then

$\sup_{s \in \mathbb{R}} \left| \mathbb{P}\left( \frac{X_1 + \dotsb + X_N}{\sqrt{N}} \leq s \right) - \Phi(s) \right| \leq C(k)\,\frac{\mathbb{E}[|X|^{k+1}]}{N^{(k-1)/2}} + 3\,\exp\left(-c h w^3 \frac{N}{\mathbb{E}[|X|^{k+1}]}\right)$

with universal $c > 0$ (Johnston, 2023). For symmetric laws with finite fourth moment, the rate improves to $O(1/N)$ , sharply accelerating convergence compared to the classical regime. The density assumption is necessary—without it, lattice-type distributions (e.g., Bernoulli) saturate the $O(n^{-1/2})$ barrier.

5. Dependent Data, U-Statistics, and Generalizations

The Berry–Esseen theorem has robust extensions to a wide range of dependent structures and complex statistics:

For locally dependent sequences and dependency graphs, the Kolmogorov error for sample quantiles is $O( (\log n) / \sqrt{n} )$ with explicit constants depending on local neighborhood sizes (Dey et al., 2022).
Uniform bounds for M-estimators of geometrically ergodic Markov chains, under regularity and moment-dominance conditions, are established at $O(n^{-1/2})$ , uniformly over parameter families (Hervé et al., 2012).
Generalized U-statistics and subgraph count statistics can admit Berry–Esseen rates of $O(n^{-1})$ in regimes of combinatorial cancellation or strong connectivity, with the exchangeable-pair technique providing systematic control (Zhang, 2021).

6. Berry–Esseen in Stronger Metrics and Functional Settings

Recent advances provide quantitative Berry–Esseen rates in metrics stronger than Kolmogorov:

Sharp $O(n^{-1/2})$ bounds hold in total variation distance for normalized sums of absolutely continuous independent variables with finite third absolute moment and finite relative entropy to Gaussian, and the rate can be improved to $O(n^{-1})$ in relative entropy under a fourth-moment requirement (Bobkov et al., 2011).
In the uniform norm on local limit densities, for independent random vectors in $\mathbb{R}^d$ with bounded density, Lyapunov ratio, and maximal summand density $M$ , one obtains

$\sup_{x} |p_n(x) - \varphi(x)| \leq C M^2 \sigma B_3,$

with precise dependencies on third-moment and density maxima (Bobkov et al., 2024).

For typical weighted sums with weak correlation, Kolmogorov distances decaying as $(\log n)/\sqrt{n}$ or $1/\sqrt{n}$ are available under suitable moment and small-ball constraints without the necessity of independence (Bobkov et al., 2017).

7. Specialized Models and Optimality: Random Matrix and Functional Cases

The Berry–Esseen theorem has been established for sophisticated models:

For the circular $\beta$ -ensemble and related random matrix models, the Kolmogorov distance for arc counting functions is bounded by $O((\log N)^{-1/2})$ , matching the scale of the logarithmic variance of linear statistics (Feng et al., 2019).
In the case of complex Wiener–Itô multiple integrals, optimal Berry–Esseen rates are available in the Wasserstein metric in terms of cumulant and contraction norms, extending the Fourth Moment Theorem and enabling sharp rates for statistics of the complex Ornstein–Uhlenbeck process (Chen et al., 2024).

8. Open Problems and Current Directions

Despite the extensive literature, several core questions remain open:

Determination of the optimal constants in various high- and infinite-dimensional Berry–Esseen inequalities remains a focus (Vershynin, 5 Feb 2026, Raič, 2018).
Sharp thresholds for transition between $n^{-1/2}$ and $n^{-1}$ convergence rates under minimal density smoothness conditions are both established and further investigated (Johnston, 2023, Bobkov et al., 2024).
Extensions to functional CLTs, multivariate convergence in non-Euclidean metrics, and sharp rates in dependent or non-classical probabilistic structures are active areas of research (Leppänen, 2024, Hervé et al., 2012).
The role of entropy and information-theoretic distances as quantitative proxies for CLT convergence beyond bounded variation and Kolmogorov metrics is developing (Bobkov et al., 2011).

Summary Table: Core Berry–Esseen Bounds and Regimes

Setting	Kolmogorov Bound	Metric	Rate	Paper
Classical i.i.d., 3rd moment	$C \frac{\mathbb{E}\|X\|^3}{\sigma^3 \sqrt{n}}$	Kolmogorov	$O(n^{-1/2})$	(Vershynin, 5 Feb 2026)
Multivariate, convex sets	$C_d \sum \mathbb{E}\\|X_i\\|^3$	Convex Kolmogorov	$O(d^{1/4} n^{-1/2})$	(Raič, 2018)
4th moment match, local density	$C M^2 \mathbb{E}\|X\|^3/\sqrt{n}$	Uniform density	$O(n^{-1/2})$	(Bobkov et al., 2024)
3-moment match, minimal density	$O(n^{-1})$ (plus exp. small)	Kolmogorov	$O(n^{-1})$	(Johnston, 2023)
Dependent (Markov, U-statistics, etc.)	$C n^{-1/2}$	Kolmogorov	$O(n^{-1/2})$	(1205.29472104.03479)
Entropic (TV, KL)	$O(n^{-1/2})$ or $O(n^{-1})$	TV, KL, $W_2$	$O(n^{-1/2}), O(n^{-1})$	(Bobkov et al., 2011)
Circular $\beta$ -ensemble	$C_\beta(\log N)^{-1/2}$	Kolmogorov	$O((\log N)^{-1/2})$	(Feng et al., 2019)

This compilation reflects the rigorous progression from the classical Berry–Esseen theorem for sums of independent random variables, through higher-dimensional, dependent, and structural generalizations, to advanced probabilistic, functional, and information-theoretic regimes. Each variant leverages precise smoothing and coupling methodologies, yielding explicit and, in the best cases, optimal constants. The theorem's role in probability theory remains central due to its quantitative precision and the wide scope of its generalizations.