Central Limit Theorem for Linear Statistics
- The paper demonstrates that linear statistics of eigenvalues converge to a Gaussian distribution after appropriate centering and normalization.
- It provides explicit variance formulas across models—including Hermitian, non-Hermitian, and band matrices—highlighting the role of function smoothness.
- Analytical techniques like resolvent methods, Stein’s method, and cumulant expansion are key to deriving these CLT results in diverse spectral scenarios.
The Central Limit Theorem (CLT) for linear statistics describes the Gaussian fluctuation regime of sums of smooth test functions evaluated at the eigenvalues, points, or zeros arising from an underlying random or deterministic spectral/point process. In the context of random matrices, statistical physics, number theory, and geometric models, the linear statistic takes the form , where are eigenvalues or analogous spectral quantities, and ranges over a suitable class of smooth functions. The regime of validity, nature and scaling of fluctuations, explicit formulas for the limiting variance, and necessary regularity on vary widely across models.
1. General Formulation and Significance
The linear statistic aggregates (globally or locally) the spectrum of a random matrix or point process by projecting onto the test function . Many of the most robust universality phenomena in random matrix theory and related fields center on the statement that, after appropriate centering and normalization, converges in distribution to a Gaussian random variable as : The variance depends on the model (Wigner ensemble, -ensemble, non-Hermitian case, etc.), the smoothness and support of , and any non-Gaussian features of the ensemble (such as higher-order cumulants).
Linear statistics encode macroscopic observables---such as traces, log-determinants, and counts of eigenvalues in intervals---and serve as the primary object for both theoretical universality results and practical testing procedures in statistics, signal processing, high-dimensional inference, and mathematical physics.
2. Classical Wigner and -Ensembles
In the classical Hermitian Wigner case (GOE/GUE), the eigenvalue empirical spectral measure converges to Wigner's semicircle law. For continuous and sufficiently smooth, the central result states that
admits a CLT if and only if the limiting variance
is finite, as established in (Kopel, 2015). The necessity and sufficiency, with sharp counterexamples, establish as the optimal criterion: discontinuities or functions with too rough Hölder regularity are excluded.
For general -ensembles in the one-cut regime, the Gaussian fluctuation remains, but the variance and drift are rescaled by and respectively: where is a weighted mean shift and is a double integral kernel involving the Hilbert transform of the equilibrium measure (Lambert et al., 2017). Explicit convergence rates in Wasserstein distance , often with , are available for polynomials and sufficiently smooth , and the rate and explicit norm are sharp in the Gaussian Unitary Ensemble.
3. Extensions Beyond Classical Hermitian Ensembles
a) Non-Hermitian Ensembles
For non-Hermitian random matrices with i.i.d.\ entries and spectral measure converging to the circular law, the CLT for holds for , with variance
where is the entrywise fourth cumulant (Cipolloni et al., 2019). This structure, with explicit boundary and cumulant-sensitive corrections, resolves several open questions in non-Gaussian non-Hermitian universality.
b) Band Random Matrices
For Wigner-type band matrices of bandwidth , the CLT for holds provided , and for with a bounded continuous derivative. The fluctuation is
and is given by a double integral involving a kernel reflecting the limited spatial range of the band (Li et al., 2013). As , the Wigner (GOE/GUE) kernel is recovered. For , Poisson-type statistics emerge.
c) Covariance, Separable, and Spiked Models
In large-dimensional sample covariance models (, ), the CLT holds for analytic test functions, with explicit formulas for mean and variance via complex contour integrals involving the Stieltjes transform of the limiting spectral density (Bai et al., 2010, Zhidong et al., 2016, Liu et al., 5 Oct 2025). When "spikes" are present (e.g., a finite number of eigenvalues of the population covariance separated from the bulk), the limiting distribution includes deterministic spike terms and variance corrections, with supercritical spikes yielding independent additive Gaussian contributions (Liu et al., 5 Oct 2025).
For rank-based or nonparametric models, such as Kendall's rank correlation matrices, the CLT holds for analytic under asymptotic Marchenko–Pastur regime without any moment conditions, providing robustness to heavy tails and skewness (Li et al., 2019).
d) Diluted Graphs and Simplicial Complexes
Erdős–Rényi graphs with average degree diverging (, ) display fluctuations governed by the "non-Gaussian" part of the Wigner variance. For Sobolev class , , with centering against the semicircle, the statistic is Gaussian with double-integral variance matching the non-Gaussian kernel from the full Wigner case (Shcherbina et al., 2011).
For higher-dimensional Linial–Meshulam complexes, the (d–1)st adjacency eigenvalue linear statistics admit a CLT with a variance explicitly computable as a sum over combinatorial types of tree and bracelet subcomplexes, with large deviation (Talagrand-type) concentration arguments extending the result to smooth test functions (Kanazawa et al., 2023).
4. Linear Statistics in Other Point Processes and Fields
Sine-β and Microscopic Processes
In the Sine-β process, the infinite volume bulk limit of -ensembles, linear statistics for scaled, compactly supported test functions converge to a normal distribution with variance proportional to the -Sobolev norm: (Leblé, 2018). This universality reflects the rigidity and logarithmic repulsion of bulk eigenvalues.
Random Geometric and Arithmetic Settings
On random covers of compact hyperbolic surfaces, the CLT for smooth local statistics of Laplacian eigenvalues holds; the centered and normalized counting function in small spectral windows converges to a normal variable whose variance matches the GOE/GUE prediction: depending on the symmetry type (Maoz, 2023).
In number theory, Selberg's CLT for the value distribution of can be generalized to statistics weighted by local zero statistics of zeta zeros; the limiting law is still Gaussian for test functions with Fourier transform of sufficiently restricted support, contingent upon RH in certain cases (Fazzari et al., 5 Jul 2025).
5. Methodological Pillars
Analysis of fluctuations of linear statistics relies on several recurring methodologies:
- Resolvent and Stieltjes Transform Methods: Expressing linear statistics as contour or real integrals of resolvent traces, leading to martingale and cumulant expansions (Bai et al., 2010, Cipolloni et al., 2019, Zhidong et al., 2016).
- Stein’s Method and Normal Approximation: Used to derive explicit convergence rates in Wasserstein distance and to handle correlated traces via exchangeable pairs (Lambert et al., 2017, Döbler et al., 2012).
- Martingale CLT and Moment Methods: Martingale differences arising from sequential conditioning on columns or rows, enabling verification of Lindeberg conditions and computation of variance (Bai et al., 2010, Li et al., 2019).
- Diagrammatic and Combinatorial Enumeration: For band/random combinatorial models, enumeration of contributing graph structures (such as Dyck paths, fat-trees, high-dimensional trees, and bracelets) yields explicit variance expressions (Li et al., 2013, Kanazawa et al., 2023).
- Cumulant Expansion and Gaussian Comparison: Matching moments to cancel non-universal contributions, allowing extension from the Gaussian to general entry distributions (Lindeberg replacement, cumulant decoupling) (Zhidong et al., 2016, Benaych-Georges et al., 2013).
- Functional Analysis: Optimal regularity results delineating when the CLT holds, especially via Sobolev or Hölder spaces and explicit spectral kernel analysis (Kopel, 2015, Leblé, 2018).
6. Applications, Limitations, and Open Questions
Linear statistics CLTs underpin hypothesis testing in high-dimensional statistics (e.g., identity tests for covariance, detection of spikes, community detection in block models), the understanding of universality classes in random matrix theory, and the modeling of counting functions in algebraic geometry and number theory.
Extending these CLTs to less regular test functions, non-smooth or discontinuous statistics, or into regimes beyond global linear statistics (e.g., for mesoscopic or edge scaling, or in the presence of strong localization or heavy tails) remains an area of active research. The explicit dependency of variance formulas on higher cumulants or combinatorial/geometric structures continues to be a subject of investigation in advanced random matrix and geometric probability theory.
7. Illustrative Table: Variance Formulas in Key Models
| Model / Class | Test Function Class | Limiting Variance Formula |
|---|---|---|
| Hermitian Wigner (GOE/GUE) | Bounded + | (Kopel, 2015) |
| General -ensemble | , | (Lambert et al., 2017) |
| Non-Hermitian i.i.d. () | (Cipolloni et al., 2019) | |
| Band Wigner () | , Sobolev | (Li et al., 2013) |
| Sine- process | compact | (Leblé, 2018) |
| Random Simplicial Complex | Polynomials, | (see combinatorial formulas) (Kanazawa et al., 2023) |
| Separable Sample Covariance | Analytic | (Zhidong et al., 2016) |
The explicit forms of the variance kernels, combinatorial coefficients, and dependence on cumulant/correlation structure are model-specific and derived via detailed combinatorial, analytic, and probabilistic arguments in each cited work.
References:
- (Kopel, 2015) for GUE optimal CLT
- (Lambert et al., 2017) for -ensembles with rates
- (Li et al., 2013) for band random matrices
- (Cipolloni et al., 2019) for non-Hermitian ensembles
- (Leblé, 2018) for Sine- process
- (Kanazawa et al., 2023) for random simplicial complexes
- (Zhidong et al., 2016) for separable sample covariance CLT
- (Liu et al., 5 Oct 2025) for spiked covariance CLT
- (Shcherbina et al., 2011) for diluted random graphs
- (Li et al., 2019) for Kendall’s correlation CLT