Local CLT for High-Dimensional Densities

Updated 27 October 2025

The paper presents a framework for achieving local CLTs in growing dimensions by decomposing sums into CLT and LCLT components with explicit error bounds.
It employs geometric decompositions and stabilization techniques to control dependencies and ensure pointwise Gaussian convergence in high-dimensional settings.
The analysis offers practical insights for high-dimensional inference and spatial statistics, specifying error rates and scalability conditions for complex models.

The local central limit theorem (LCLT) for densities in growing dimensions addresses the precise pointwise asymptotics of probability densities associated with sums or functionals of high-dimensional random variables and structures, where both the dimension and sample size scale together. The subject synthesizes abstract limit processes, saddlepoint approximations, geometric decompositions, and dependence structures to rigorously characterize the conditions under which Gaussian-like density convergence prevails, identifies explicit rates, and supports applications ranging from stochastic geometry to statistical inference and complex dependent models.

1. Theoretical Framework for Local CLT in High Dimensions

The principal structure underpinning LCLTs in growing dimensions is the careful decomposition of the target sum or functional. A general template is established in (Penrose et al., 2010) by considering

$Z_n = Y_n + S_n$

where $Y_n$ (CLT component) and $S_n$ (LCLT component) are independent, scaled so their variances are of the same order. Provided the error $\|Z_n - [Y_n + S_n]\|$ is negligible on the $n^{1/2}$ scale, and $Y_n$ satisfies a global CLT, then $Z_n$ satisfies a LCLT:

$\sup_{u\in\mathbb{R}} \left| n^{1/2} P[Z_n\in[u,\,u+b)) - \varphi_\sigma(u-E[Z_n]) \cdot b \right| \to 0,$

where $\varphi_\sigma$ is the normal density with variance $\sigma^2$ ((Penrose et al., 2010), formula (2.2)). Such decompositions are feasible due to strong geometric independence (e.g., spatial tessellation into "good boxes") or stabilization. Crucially, the validity of the LCLT extends to high dimensions as long as local dependencies are controlled (via partitioning or finite range interactions).

This abstract principle is refined through the use of saddlepoint approximations (SPA) for sums of i.i.d. random vectors $X_1, \ldots, X_n$ in $\mathbb{R}^d$ where $d$ grows with $n$ (Katsevich, 24 Oct 2025). The density of the normalized sum via SPA is

$\rho_n(a) = \frac{n^{d/2}}{(2\pi)^{d/2} \sqrt{\det H}} \exp\left[-n\varphi^*(a)\right] \cdot I(a)$

with $I(a)$ as a correction factor, $H$ as the Hessian at the saddlepoint, and $\varphi^*(a)$ as the Legendre transform. Under analytic and moment conditions, the SPA error—previously bounded by $O(d^3/n)$ —is refined to $O(d^2/n)$ . The local CLT (i.e., pointwise density convergence to Gaussian) holds with explicit multiplicative error bounds whenever $d^2/n\to 0$ ((Katsevich, 24 Oct 2025), Corollary 4.1).

2. Geometric and Combinatorial Decomposition Techniques

In high-dimensional or geometric probability, spatial decompositions are foundational. For percolation processes, random geometric graphs, or nearest-neighbor functionals (Penrose et al., 2010), the configuration is partitioned into well-separated "good boxes" or shielded regions, so that the sum exhibits approximate independence. Each region contributes locally—often on a lattice—where classical LCLT applies, with remaining global interactions only altering variance or centering.

For local empirical processes near boundaries (Einmahl et al., 2011), differentiation of sets in measure is utilized: the process is indexed by classes of shrinking sets parameterized by a "local magnification map" mapping neighborhoods of a boundary $\partial K$ into a cylinder $\Gamma = \partial K \times [-1,1]$ . Local CLTs are then cast as convergence of normalized counts over these derivative sets, yielding set-parametric Brownian motion limits with explicit covariance induced by density approximations ( $p_+$ , $p_-$ ).

In Poisson tessellation or spin systems on Cayley graphs, cluster expansions and quasi-locality are harnessed to control dependency. For exponentially quasi-local statistics, the tail of the stabilization radius decays rapidly (Reddy et al., 2017), enabling the application of cumulant methods and factorial moment expansion for LCLT.

3. Bandwidth, Regularity, and Dimension Scalings

The interplay between bandwidth selection and dimension governs regularity properties and convergence rates in kernel/statistical smoothing. In multidimensional ergodic diffusions (Rohde et al., 2010), the smoothed empirical process

$S_{t,h}(f) = \sqrt{t} \int_E f(x) \, \pi_{t,h}(x)\, dx,$

with $\pi_{t,h}$ the kernel estimator, can be constructed with exponentially small bandwidth $h_t$ for $d=2$ , and strongly undersmoothed bandwidth for $d\geq 3$ provided drift/diffusion coefficients are in a Hölder ball with smoothness $\beta>d/2$ . As the dimension grows, this extra regularity diminishes but remains advantageous for uniform CLTs.

In saddlepoint approximations (Katsevich, 24 Oct 2025), analyticity and bounds on higher-order derivatives of the cumulant generating function $\varphi$ are essential. Heavy tails or poor spectral gap properties degrade rates, enforcing stricter $d^2/n \to 0$ conditions for high-dimensional convergence.

4. Explicit Error Rates and Pointwise Gaussian Asymptotics

The quantification of LCLT accuracy is explicit in several formulations:

Saddlepoint error bound (Katsevich, 24 Oct 2025):

$|I(a) - 1| \lesssim \exp(40 c_4^2)(c_3^2 + c_4) + e^{-d} + (e/\kappa^2)^{d/2}$

where $c_3$ , $c_4$ control third/fourth cumulant derivatives and $\kappa$ affects the tail.

Markov Additive Processes (Hervé et al., 2013):

$\sup_{y\in\mathbb{R}^d}|f_{k,t}(y) - n(y)| = O(t^{-1/2}) + O(\sup_{y\notin D_t} n(y))$

with Gaussian density $n(y) = (2\pi)^{-d/2}(\det \Sigma)^{-1/2} \exp[-\frac{1}{2} y^\top \Sigma^{-1} y]$ .

Gradient field models (Wu, 2022):

The density of $\phi(0)/\sqrt{\log N}$ is $g_N(x) = \frac{1}{\sqrt{2\pi}} \exp(-\frac{x^2}{2g}) + O((\log N)^{-1/2})$ , uniformly in $x$ .

Dimension-Dependence in CDF (Koike, 2019):

Uniform Gaussian approximation for probabilities on hyperrectangles requires $(\log d)^5/n \to 0$ for general random vectors and $(\log d)^3/n \to 0$ under a common factor structure, with error rates precisely given via discrepancy parameters $\delta_{n,1}$ , $\delta_{n,2}$ .

5. Specialized Models and Multi-Point Joint LCLTs

Permutation statistics under the Mallows measure exhibit local CLT behavior in the "height function" (Bufetov et al., 16 Sep 2024). As $N\to\infty$ , probability masses near the deterministic profile $h_\rho(x, y)N$ are asymptotically Gaussian in window sizes $O(\sqrt{N})$ , with explicit density scaling and local variance $d_B(x, y)$ . Multi-point versions extend this to vectors of height function values at distinct positions, yielding joint multivariate Gaussian limits with explicit covariance.

In noncommutative probability (free CLT, (Williams, 2011)), uniform convergence of densities holds over compact subsets of the semicircle law, even for unbounded supports—critical when "dimension" grows through larger sums or more involved algebraic structures.

In spin models and random cubical complexes (Reddy et al., 2017), CLT and local limit statements for multi-dimensional statistics are facilitated by cluster expansions, stabilization radius estimates, and polynomial growth of underlying graphs (volume $w_n \sim n^d$ ).

6. Practical Implications and Directions for Applications

Practical applications are pervasive:

High-dimensional inference, where uniform local CLTs underpin simultaneous hypothesis testing, construction of uniform confidence intervals, and controlling family-wise error rates (Koike, 2019, Das, 2020).
Spatial statistics, set estimation, and change-set problems in geometry and topology benefit from localized Gaussian approximations that accurately describe the distribution near boundaries or under geometric constraints (Einmahl et al., 2011, Penrose et al., 2010).
Random matrix theory and statistical physics, where local laws with optimal error bounds govern eigenvalue rigidity and log-correlated fluctuation fields (Bourgade et al., 2021).
Gradient and interface models in statistical mechanics, with Berry–Esseen type rates for pointwise densities (Wu, 2022, Lanconelli et al., 2015).

Methodologies such as saddlepoint approximation, martingale approximation, spectral gap control, and high-precision cluster expansions are generalized across models to handle increasing dimensionality and complex dependence, with error rates specified in terms of $d$ , $n$ , and higher moments.

7. Limitations and Open Directions

While the frameworks described accommodate rapid dimension growth subject to explicit scaling (e.g., $d^2/n \to 0$ or $(\log d)^5/n \to 0$ ), challenges remain in:

Relaxing regularity assumptions, especially for non-analytic or heavy-tailed contexts (necessitating further exploration of spectral gap conditions or improved concentration inequalities).
Extending local CLT results to non-uniform densities or on manifolds, as high curvature and boundary effects threaten normal approximations (Herold et al., 2019).
Optimizing error rates: further refining saddlepoint analysis, improving control over cluster expansions, and elucidating exact scaling thresholds in ultra-high-dimensional regimes.
Multivariate local CLT for joint statistics with complex or singular dependency, extending present Gaussian kernels to more general covariance structures (Bufetov et al., 16 Sep 2024, Hervé et al., 2013).
Applying these decompositions to random measures and non-Euclidean settings (hyperbolic, spherical, or foliation-type spaces).

A plausible implication is that the union of SPA improvements, geometric localization, and stabilization theory will enable sharper LCLT error bounds and promote uniform density approximations in diverse complex models.

The local central limit theorem for densities in growing dimensions thus forms the theoretical bedrock for precise, quantitative Gaussian approximations of density functions in high-dimensional probability, geometric, and dependent models, supporting applications from stochastic geometry to high-dimensional statistical inference, with explicit error control and clear pathways for further refinement.