Central Limit Theorem for Linear Statistics

Updated 10 November 2025

The paper demonstrates that linear statistics of eigenvalues converge to a Gaussian distribution after appropriate centering and normalization.
It provides explicit variance formulas across models—including Hermitian, non-Hermitian, and band matrices—highlighting the role of function smoothness.
Analytical techniques like resolvent methods, Stein’s method, and cumulant expansion are key to deriving these CLT results in diverse spectral scenarios.

The Central Limit Theorem (CLT) for linear statistics describes the Gaussian fluctuation regime of sums of smooth test functions evaluated at the eigenvalues, points, or zeros arising from an underlying random or deterministic spectral/point process. In the context of random matrices, statistical physics, number theory, and geometric models, the linear statistic takes the form $\sum_{i=1}^n f(\lambda_i)$ , where $\{\lambda_i\}$ are eigenvalues or analogous spectral quantities, and $f$ ranges over a suitable class of smooth functions. The regime of validity, nature and scaling of fluctuations, explicit formulas for the limiting variance, and necessary regularity on $f$ vary widely across models.

1. General Formulation and Significance

The linear statistic $L_n[f]=\sum_{i=1}^n f(\lambda_i)$ aggregates (globally or locally) the spectrum of a random matrix or point process by projecting onto the test function $f$ . Many of the most robust universality phenomena in random matrix theory and related fields center on the statement that, after appropriate centering and normalization, $L_n[f]$ converges in distribution to a Gaussian random variable as $n\to\infty$ : $\frac{L_n[f] - \mathbb{E}L_n[f]}{\sqrt{\operatorname{Var} L_n[f]}} \xrightarrow{d} N(0,1).$ The variance $\operatorname{Var} L_n[f]$ depends on the model (Wigner ensemble, $\beta$ -ensemble, non-Hermitian case, etc.), the smoothness and support of $f$ , and any non-Gaussian features of the ensemble (such as higher-order cumulants).

Linear statistics encode macroscopic observables---such as traces, log-determinants, and counts of eigenvalues in intervals---and serve as the primary object for both theoretical universality results and practical testing procedures in statistics, signal processing, high-dimensional inference, and mathematical physics.

2. Classical Wigner and $\beta$ -Ensembles

In the classical Hermitian Wigner case (GOE/GUE), the eigenvalue empirical spectral measure converges to Wigner's semicircle law. For $f$ continuous and sufficiently smooth, the central result states that

$L_n[f] - n \int f(x)\rho_{sc}(x)dx$

admits a CLT if and only if the limiting variance

$V[f] = \frac{1}{4\pi^2} \int_{-2}^{2}\int_{-2}^{2} \left(\frac{f(x)-f(y)}{x-y}\right)^2 \frac{4 - xy}{\sqrt{4-x^2}\sqrt{4-y^2}} dx\,dy$

is finite, as established in (Kopel, 2015). The necessity and sufficiency, with sharp counterexamples, establish $V[f] < \infty$ as the optimal criterion: discontinuities or functions with too rough Hölder regularity are excluded.

For general $\beta$ -ensembles in the one-cut regime, the Gaussian fluctuation remains, but the variance and drift are rescaled by $1/\beta$ and $(2-\beta)$ respectively: $L_N(f) \xrightarrow{d} N\bigl((2-\beta)m(f),\, (2/\beta)E(f)\bigr)$ where $m(f)$ is a weighted mean shift and $E(f)$ is a double integral kernel involving the Hilbert transform of the equilibrium measure (Lambert et al., 2017). Explicit convergence rates in Wasserstein distance $N^{-\alpha}$ , often with $\alpha = 1$ , are available for polynomials and sufficiently smooth $f$ , and the rate and explicit norm are sharp in the Gaussian Unitary Ensemble.

3. Extensions Beyond Classical Hermitian Ensembles

a) Non-Hermitian Ensembles

For non-Hermitian random matrices $X$ with i.i.d.\ entries and spectral measure converging to the circular law, the CLT for $S_N(f) = \sum f(\lambda_i) - N \int f\, d\mu_\circ$ holds for $f \in H_0^{2+\epsilon}$ , with variance

$V(f) = \frac{1}{4\pi}\int_{D} |\nabla f|^2\,dz + \frac12 \langle f, f \rangle_{\dot{H}^{1/2}(\partial D)} + \kappa_4 \left(\frac{1}{\pi} \int_D f - \frac{1}{2\pi} \int_{\partial D} f \right)^2$

where $\kappa_4$ is the entrywise fourth cumulant (Cipolloni et al., 2019). This structure, with explicit boundary and cumulant-sensitive corrections, resolves several open questions in non-Gaussian non-Hermitian universality.

b) Band Random Matrices

For Wigner-type band matrices $M$ of bandwidth $b_n$ , the CLT for $L_n[\varphi]$ holds provided $\sqrt n \ll b_n \ll n$ , and for $\varphi$ with a bounded continuous derivative. The fluctuation is

$\mathcal{N}_n[\varphi] = (b_n/n)^{1/2}(L_n[\varphi] - \mathbb{E} L_n[\varphi]) \xrightarrow{d} N(0,V_{band}[\varphi]),$

and $V_{band}[\varphi]$ is given by a double integral involving a kernel $F_\sigma(x,y)$ reflecting the limited spatial range of the band (Li et al., 2013). As $b_n \rightarrow n$ , the Wigner (GOE/GUE) kernel is recovered. For $b_n = O(\sqrt n)$ , Poisson-type statistics emerge.

c) Covariance, Separable, and Spiked Models

In large-dimensional sample covariance models ( $p,n\to\infty$ , $p/n \to c$ ), the CLT holds for analytic test functions, with explicit formulas for mean and variance via complex contour integrals involving the Stieltjes transform of the limiting spectral density (Bai et al., 2010, Zhidong et al., 2016, Liu et al., 5 Oct 2025). When "spikes" are present (e.g., a finite number of eigenvalues of the population covariance separated from the bulk), the limiting distribution includes deterministic spike terms and variance corrections, with supercritical spikes yielding independent additive Gaussian contributions (Liu et al., 5 Oct 2025).

For rank-based or nonparametric models, such as Kendall's rank correlation matrices, the CLT holds for analytic $f$ under asymptotic Marchenko–Pastur regime without any moment conditions, providing robustness to heavy tails and skewness (Li et al., 2019).

d) Diluted Graphs and Simplicial Complexes

Erdős–Rényi graphs with average degree diverging ( $p_n\rightarrow \infty$ , $p_n/n\rightarrow 0$ ) display fluctuations governed by the "non-Gaussian" part of the Wigner variance. For Sobolev class $H^s$ , $s>3/2$ , with centering against the semicircle, the statistic $(p_n/n)^{1/2}[N_n[\varphi] - \mathbb{E}N_n[\varphi]]$ is Gaussian with double-integral variance matching the non-Gaussian kernel from the full Wigner case (Shcherbina et al., 2011).

For higher-dimensional Linial–Meshulam complexes, the (d–1)st adjacency eigenvalue linear statistics admit a CLT with a variance explicitly computable as a sum over combinatorial types of tree and bracelet subcomplexes, with large deviation (Talagrand-type) concentration arguments extending the result to smooth $C^2$ test functions (Kanazawa et al., 2023).

4. Linear Statistics in Other Point Processes and Fields

Sine-β and Microscopic Processes

In the Sine-β process, the infinite volume bulk limit of $\beta$ -ensembles, linear statistics for scaled, compactly supported $C^4$ test functions converge to a normal distribution with variance proportional to the $H^{1/2}$ -Sobolev norm: $\mathrm{Var} = \frac{2}{\beta} \|\bar\varphi\|_{H^{1/2}}^2$ (Leblé, 2018). This universality reflects the rigidity and logarithmic repulsion of bulk eigenvalues.

Random Geometric and Arithmetic Settings

On random covers of compact hyperbolic surfaces, the CLT for smooth local statistics of Laplacian eigenvalues holds; the centered and normalized counting function in small spectral windows converges to a normal variable whose variance matches the GOE/GUE prediction: $\lim_{L\to\infty}\lim_{n\to\infty} \mathrm{Var} N_n(L) = \Sigma_{\mathrm{GOE}/\mathrm{GUE}}^2(\psi)$ depending on the symmetry type (Maoz, 2023).

In number theory, Selberg's CLT for the value distribution of $\log \zeta(1/2 + iT)$ can be generalized to statistics weighted by local zero statistics of zeta zeros; the limiting law is still Gaussian for test functions with Fourier transform of sufficiently restricted support, contingent upon RH in certain cases (Fazzari et al., 5 Jul 2025).

5. Methodological Pillars

Analysis of fluctuations of linear statistics relies on several recurring methodologies:

Resolvent and Stieltjes Transform Methods: Expressing linear statistics as contour or real integrals of resolvent traces, leading to martingale and cumulant expansions (Bai et al., 2010, Cipolloni et al., 2019, Zhidong et al., 2016).
Stein’s Method and Normal Approximation: Used to derive explicit convergence rates in Wasserstein distance and to handle correlated traces via exchangeable pairs (Lambert et al., 2017, Döbler et al., 2012).
Martingale CLT and Moment Methods: Martingale differences arising from sequential conditioning on columns or rows, enabling verification of Lindeberg conditions and computation of variance (Bai et al., 2010, Li et al., 2019).
Diagrammatic and Combinatorial Enumeration: For band/random combinatorial models, enumeration of contributing graph structures (such as Dyck paths, fat-trees, high-dimensional trees, and bracelets) yields explicit variance expressions (Li et al., 2013, Kanazawa et al., 2023).
Cumulant Expansion and Gaussian Comparison: Matching moments to cancel non-universal contributions, allowing extension from the Gaussian to general entry distributions (Lindeberg replacement, cumulant decoupling) (Zhidong et al., 2016, Benaych-Georges et al., 2013).
Functional Analysis: Optimal regularity results delineating when the CLT holds, especially via Sobolev or Hölder spaces and explicit spectral kernel analysis (Kopel, 2015, Leblé, 2018).

6. Applications, Limitations, and Open Questions

Linear statistics CLTs underpin hypothesis testing in high-dimensional statistics (e.g., identity tests for covariance, detection of spikes, community detection in block models), the understanding of universality classes in random matrix theory, and the modeling of counting functions in algebraic geometry and number theory.

Extending these CLTs to less regular test functions, non-smooth or discontinuous statistics, or into regimes beyond global linear statistics (e.g., for mesoscopic or edge scaling, or in the presence of strong localization or heavy tails) remains an area of active research. The explicit dependency of variance formulas on higher cumulants or combinatorial/geometric structures continues to be a subject of investigation in advanced random matrix and geometric probability theory.

7. Illustrative Table: Variance Formulas in Key Models

Model / Class	Test Function Class	Limiting Variance Formula
Hermitian Wigner (GOE/GUE)	Bounded + $V[f]<\infty$	$\frac{1}{4\pi^2} \iint (\frac{f(x)-f(y)}{x-y})^2 K(x,y)$ (Kopel, 2015)
General $\beta$ -ensemble	$C^{k+4}$ , $k\ge5$	$(2/\beta) \iint (\frac{f(x)-f(y)}{x-y})^2 m_0(x,y)$ (Lambert et al., 2017)
Non-Hermitian i.i.d. ( $\chi$ )	$H_0^{2+\epsilon}$	$\frac{1}{4\pi}\!\int_D\|\nabla f\|^2 + \cdots$ (Cipolloni et al., 2019)
Band Wigner ( $b_n$ )	$C^1$ , Sobolev	$\iint (\frac{f(x)-f(y)}{x-y})^2 F_\sigma(x,y)$ (Li et al., 2013)
Sine- $\beta$ process	$C^4$ compact	$(2/\beta) \\|\varphi\\|_{H^{1/2}}^2$ (Leblé, 2018)
Random Simplicial Complex	Polynomials, $C^2$	$\sum_{k,l} a_k a_l (k,l)$ (see combinatorial formulas) (Kanazawa et al., 2023)
Separable Sample Covariance	Analytic	$-\frac{1}{4\pi^2} \iint f(z_1)g(z_2) \partial^2_{z_1 z_2} \log[1-K(z_1,z_2)] dz_1dz_2$ (Zhidong et al., 2016)

The explicit forms of the variance kernels, combinatorial coefficients, and dependence on cumulant/correlation structure are model-specific and derived via detailed combinatorial, analytic, and probabilistic arguments in each cited work.

References: