Papers
Topics
Authors
Recent
2000 character limit reached

Central Limit Theorem for Linear Statistics

Updated 10 November 2025
  • The paper demonstrates that linear statistics of eigenvalues converge to a Gaussian distribution after appropriate centering and normalization.
  • It provides explicit variance formulas across models—including Hermitian, non-Hermitian, and band matrices—highlighting the role of function smoothness.
  • Analytical techniques like resolvent methods, Stein’s method, and cumulant expansion are key to deriving these CLT results in diverse spectral scenarios.

The Central Limit Theorem (CLT) for linear statistics describes the Gaussian fluctuation regime of sums of smooth test functions evaluated at the eigenvalues, points, or zeros arising from an underlying random or deterministic spectral/point process. In the context of random matrices, statistical physics, number theory, and geometric models, the linear statistic takes the form i=1nf(λi)\sum_{i=1}^n f(\lambda_i), where {λi}\{\lambda_i\} are eigenvalues or analogous spectral quantities, and ff ranges over a suitable class of smooth functions. The regime of validity, nature and scaling of fluctuations, explicit formulas for the limiting variance, and necessary regularity on ff vary widely across models.

1. General Formulation and Significance

The linear statistic Ln[f]=i=1nf(λi)L_n[f]=\sum_{i=1}^n f(\lambda_i) aggregates (globally or locally) the spectrum of a random matrix or point process by projecting onto the test function ff. Many of the most robust universality phenomena in random matrix theory and related fields center on the statement that, after appropriate centering and normalization, Ln[f]L_n[f] converges in distribution to a Gaussian random variable as nn\to\infty: Ln[f]ELn[f]VarLn[f]dN(0,1).\frac{L_n[f] - \mathbb{E}L_n[f]}{\sqrt{\operatorname{Var} L_n[f]}} \xrightarrow{d} N(0,1). The variance VarLn[f]\operatorname{Var} L_n[f] depends on the model (Wigner ensemble, β\beta-ensemble, non-Hermitian case, etc.), the smoothness and support of ff, and any non-Gaussian features of the ensemble (such as higher-order cumulants).

Linear statistics encode macroscopic observables---such as traces, log-determinants, and counts of eigenvalues in intervals---and serve as the primary object for both theoretical universality results and practical testing procedures in statistics, signal processing, high-dimensional inference, and mathematical physics.

2. Classical Wigner and β\beta-Ensembles

In the classical Hermitian Wigner case (GOE/GUE), the eigenvalue empirical spectral measure converges to Wigner's semicircle law. For ff continuous and sufficiently smooth, the central result states that

Ln[f]nf(x)ρsc(x)dxL_n[f] - n \int f(x)\rho_{sc}(x)dx

admits a CLT if and only if the limiting variance

V[f]=14π22222(f(x)f(y)xy)24xy4x24y2dxdyV[f] = \frac{1}{4\pi^2} \int_{-2}^{2}\int_{-2}^{2} \left(\frac{f(x)-f(y)}{x-y}\right)^2 \frac{4 - xy}{\sqrt{4-x^2}\sqrt{4-y^2}} dx\,dy

is finite, as established in (Kopel, 2015). The necessity and sufficiency, with sharp counterexamples, establish V[f]<V[f] < \infty as the optimal criterion: discontinuities or functions with too rough Hölder regularity are excluded.

For general β\beta-ensembles in the one-cut regime, the Gaussian fluctuation remains, but the variance and drift are rescaled by 1/β1/\beta and (2β)(2-\beta) respectively: LN(f)dN((2β)m(f),(2/β)E(f))L_N(f) \xrightarrow{d} N\bigl((2-\beta)m(f),\, (2/\beta)E(f)\bigr) where m(f)m(f) is a weighted mean shift and E(f)E(f) is a double integral kernel involving the Hilbert transform of the equilibrium measure (Lambert et al., 2017). Explicit convergence rates in Wasserstein distance NαN^{-\alpha}, often with α=1\alpha = 1, are available for polynomials and sufficiently smooth ff, and the rate and explicit norm are sharp in the Gaussian Unitary Ensemble.

3. Extensions Beyond Classical Hermitian Ensembles

a) Non-Hermitian Ensembles

For non-Hermitian random matrices XX with i.i.d.\ entries and spectral measure converging to the circular law, the CLT for SN(f)=f(λi)NfdμS_N(f) = \sum f(\lambda_i) - N \int f\, d\mu_\circ holds for fH02+ϵf \in H_0^{2+\epsilon}, with variance

V(f)=14πDf2dz+12f,fH˙1/2(D)+κ4(1πDf12πDf)2V(f) = \frac{1}{4\pi}\int_{D} |\nabla f|^2\,dz + \frac12 \langle f, f \rangle_{\dot{H}^{1/2}(\partial D)} + \kappa_4 \left(\frac{1}{\pi} \int_D f - \frac{1}{2\pi} \int_{\partial D} f \right)^2

where κ4\kappa_4 is the entrywise fourth cumulant (Cipolloni et al., 2019). This structure, with explicit boundary and cumulant-sensitive corrections, resolves several open questions in non-Gaussian non-Hermitian universality.

b) Band Random Matrices

For Wigner-type band matrices MM of bandwidth bnb_n, the CLT for Ln[φ]L_n[\varphi] holds provided nbnn\sqrt n \ll b_n \ll n, and for φ\varphi with a bounded continuous derivative. The fluctuation is

Nn[φ]=(bn/n)1/2(Ln[φ]ELn[φ])dN(0,Vband[φ]),\mathcal{N}_n[\varphi] = (b_n/n)^{1/2}(L_n[\varphi] - \mathbb{E} L_n[\varphi]) \xrightarrow{d} N(0,V_{band}[\varphi]),

and Vband[φ]V_{band}[\varphi] is given by a double integral involving a kernel Fσ(x,y)F_\sigma(x,y) reflecting the limited spatial range of the band (Li et al., 2013). As bnnb_n \rightarrow n, the Wigner (GOE/GUE) kernel is recovered. For bn=O(n)b_n = O(\sqrt n), Poisson-type statistics emerge.

c) Covariance, Separable, and Spiked Models

In large-dimensional sample covariance models (p,np,n\to\infty, p/ncp/n \to c), the CLT holds for analytic test functions, with explicit formulas for mean and variance via complex contour integrals involving the Stieltjes transform of the limiting spectral density (Bai et al., 2010, Zhidong et al., 2016, Liu et al., 5 Oct 2025). When "spikes" are present (e.g., a finite number of eigenvalues of the population covariance separated from the bulk), the limiting distribution includes deterministic spike terms and variance corrections, with supercritical spikes yielding independent additive Gaussian contributions (Liu et al., 5 Oct 2025).

For rank-based or nonparametric models, such as Kendall's rank correlation matrices, the CLT holds for analytic ff under asymptotic Marchenko–Pastur regime without any moment conditions, providing robustness to heavy tails and skewness (Li et al., 2019).

d) Diluted Graphs and Simplicial Complexes

Erdős–Rényi graphs with average degree diverging (pnp_n\rightarrow \infty, pn/n0p_n/n\rightarrow 0) display fluctuations governed by the "non-Gaussian" part of the Wigner variance. For Sobolev class HsH^s, s>3/2s>3/2, with centering against the semicircle, the statistic (pn/n)1/2[Nn[φ]ENn[φ]](p_n/n)^{1/2}[N_n[\varphi] - \mathbb{E}N_n[\varphi]] is Gaussian with double-integral variance matching the non-Gaussian kernel from the full Wigner case (Shcherbina et al., 2011).

For higher-dimensional Linial–Meshulam complexes, the (d–1)st adjacency eigenvalue linear statistics admit a CLT with a variance explicitly computable as a sum over combinatorial types of tree and bracelet subcomplexes, with large deviation (Talagrand-type) concentration arguments extending the result to smooth C2C^2 test functions (Kanazawa et al., 2023).

4. Linear Statistics in Other Point Processes and Fields

Sine-β and Microscopic Processes

In the Sine-β process, the infinite volume bulk limit of β\beta-ensembles, linear statistics for scaled, compactly supported C4C^4 test functions converge to a normal distribution with variance proportional to the H1/2H^{1/2}-Sobolev norm: Var=2βφˉH1/22\mathrm{Var} = \frac{2}{\beta} \|\bar\varphi\|_{H^{1/2}}^2 (Leblé, 2018). This universality reflects the rigidity and logarithmic repulsion of bulk eigenvalues.

Random Geometric and Arithmetic Settings

On random covers of compact hyperbolic surfaces, the CLT for smooth local statistics of Laplacian eigenvalues holds; the centered and normalized counting function in small spectral windows converges to a normal variable whose variance matches the GOE/GUE prediction: limLlimnVarNn(L)=ΣGOE/GUE2(ψ)\lim_{L\to\infty}\lim_{n\to\infty} \mathrm{Var} N_n(L) = \Sigma_{\mathrm{GOE}/\mathrm{GUE}}^2(\psi) depending on the symmetry type (Maoz, 2023).

In number theory, Selberg's CLT for the value distribution of logζ(1/2+iT)\log \zeta(1/2 + iT) can be generalized to statistics weighted by local zero statistics of zeta zeros; the limiting law is still Gaussian for test functions with Fourier transform of sufficiently restricted support, contingent upon RH in certain cases (Fazzari et al., 5 Jul 2025).

5. Methodological Pillars

Analysis of fluctuations of linear statistics relies on several recurring methodologies:

  • Resolvent and Stieltjes Transform Methods: Expressing linear statistics as contour or real integrals of resolvent traces, leading to martingale and cumulant expansions (Bai et al., 2010, Cipolloni et al., 2019, Zhidong et al., 2016).
  • Stein’s Method and Normal Approximation: Used to derive explicit convergence rates in Wasserstein distance and to handle correlated traces via exchangeable pairs (Lambert et al., 2017, Döbler et al., 2012).
  • Martingale CLT and Moment Methods: Martingale differences arising from sequential conditioning on columns or rows, enabling verification of Lindeberg conditions and computation of variance (Bai et al., 2010, Li et al., 2019).
  • Diagrammatic and Combinatorial Enumeration: For band/random combinatorial models, enumeration of contributing graph structures (such as Dyck paths, fat-trees, high-dimensional trees, and bracelets) yields explicit variance expressions (Li et al., 2013, Kanazawa et al., 2023).
  • Cumulant Expansion and Gaussian Comparison: Matching moments to cancel non-universal contributions, allowing extension from the Gaussian to general entry distributions (Lindeberg replacement, cumulant decoupling) (Zhidong et al., 2016, Benaych-Georges et al., 2013).
  • Functional Analysis: Optimal regularity results delineating when the CLT holds, especially via Sobolev or Hölder spaces and explicit spectral kernel analysis (Kopel, 2015, Leblé, 2018).

6. Applications, Limitations, and Open Questions

Linear statistics CLTs underpin hypothesis testing in high-dimensional statistics (e.g., identity tests for covariance, detection of spikes, community detection in block models), the understanding of universality classes in random matrix theory, and the modeling of counting functions in algebraic geometry and number theory.

Extending these CLTs to less regular test functions, non-smooth or discontinuous statistics, or into regimes beyond global linear statistics (e.g., for mesoscopic or edge scaling, or in the presence of strong localization or heavy tails) remains an area of active research. The explicit dependency of variance formulas on higher cumulants or combinatorial/geometric structures continues to be a subject of investigation in advanced random matrix and geometric probability theory.

7. Illustrative Table: Variance Formulas in Key Models

Model / Class Test Function Class Limiting Variance Formula
Hermitian Wigner (GOE/GUE) Bounded + V[f]<V[f]<\infty 14π2(f(x)f(y)xy)2K(x,y)\frac{1}{4\pi^2} \iint (\frac{f(x)-f(y)}{x-y})^2 K(x,y) (Kopel, 2015)
General β\beta-ensemble Ck+4C^{k+4}, k5k\ge5 (2/β)(f(x)f(y)xy)2m0(x,y)(2/\beta) \iint (\frac{f(x)-f(y)}{x-y})^2 m_0(x,y) (Lambert et al., 2017)
Non-Hermitian i.i.d. (χ\chi) H02+ϵH_0^{2+\epsilon} 14π ⁣Df2+\frac{1}{4\pi}\!\int_D|\nabla f|^2 + \cdots (Cipolloni et al., 2019)
Band Wigner (bnb_n) C1C^1, Sobolev (f(x)f(y)xy)2Fσ(x,y)\iint (\frac{f(x)-f(y)}{x-y})^2 F_\sigma(x,y) (Li et al., 2013)
Sine-β\beta process C4C^4 compact (2/β)φH1/22(2/\beta) \|\varphi\|_{H^{1/2}}^2 (Leblé, 2018)
Random Simplicial Complex Polynomials, C2C^2 k,lakal(k,l)\sum_{k,l} a_k a_l (k,l) (see combinatorial formulas) (Kanazawa et al., 2023)
Separable Sample Covariance Analytic 14π2f(z1)g(z2)z1z22log[1K(z1,z2)]dz1dz2-\frac{1}{4\pi^2} \iint f(z_1)g(z_2) \partial^2_{z_1 z_2} \log[1-K(z_1,z_2)] dz_1dz_2 (Zhidong et al., 2016)

The explicit forms of the variance kernels, combinatorial coefficients, and dependence on cumulant/correlation structure are model-specific and derived via detailed combinatorial, analytic, and probabilistic arguments in each cited work.


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Central Limit Theorem for Linear Statistics.