Papers
Topics
Authors
Recent
2000 character limit reached

Berry-Esseen Bounds for High-D Self-Normalized Sums

Updated 30 December 2025
  • The paper establishes explicit Berry–Esseen bounds for self-normalized sums, achieving rates as fast as O((log d)^(3/2)/√n) under finite third moment conditions.
  • The methodology employs truncation, smoothing, and Taylor expansions to manage nonlinearities introduced by data-dependent normalization in high dimensions.
  • The results clarify the trade-off between moment assumptions, sample size, and dimension growth, optimizing convergence in multivariate statistical inference.

High-dimensional self-normalized sums arise in multivariate statistical inference, especially in cases where the dimensionality of observed random vectors grows with sample size. The Berry-Esseen bound quantifies the rate of convergence in the Central Limit Theorem (CLT), measuring how closely the distribution of a properly normalized sum approximates a Gaussian law. In high dimensions, the interplay between sample size, dimension, and moment assumptions becomes critical, particularly for self-normalized statistics, where scaling by the data-dependent standard deviation introduces strong dependencies and nonlinearities. Recent work establishes explicit Berry-Esseen type bounds for these self-normalized sums and their maxima, significantly advancing the understanding of high-dimensional CLTs under relaxed moment assumptions (Das, 2020, Chang et al., 15 Jan 2025).

1. Problem Formulation and Self-normalized Sums

Given a sequence of independent, identically distributed (IID), mean-zero random vectors Xi=(Xi1,,Xid)RdX_i = (X_{i1}, \dots, X_{id})^\top \in \mathbb{R}^d, the primary object of interest is the self-normalized sum: Tn=(Tn,1,,Tn,d),Tn,j=i=1nXiji=1nXij2T_n = (T_{n,1},\dots,T_{n,d})^\top, \qquad T_{n,j} = \frac{\sum_{i=1}^n X_{ij}}{\sqrt{\sum_{i=1}^n X_{ij}^2}} for j=1,,dj=1,\ldots,d. For coordinate-wise inference, the distribution of Tn=max1jdTn,j\|T_n\|_\infty = \max_{1\leq j\leq d} T_{n,j} is studied, as well as the uniform approximation of TnT_n over classes of hyper-rectangles in Rd\mathbb{R}^d.

The Berry-Esseen distance for evaluating the approximation to a multivariate normal is given by

Δn=supAArePr{TnA}Pr{ZA}\Delta_n = \sup_{A\in\mathcal{A}^{\text{re}}} \left| \Pr\{T_n \in A\} - \Pr\{Z \in A\} \right|

where ZN(0,Id)Z \sim N(0, I_d) and Are\mathcal{A}^{\text{re}} denotes the class of hyper-rectangles.

2. Explicit High-dimensional Berry-Esseen Bounds

Recent results provide explicit Berry-Esseen bounds for the approximation of the law of TnT_n (or its coordinatewise maximum) by an appropriate Gaussian distribution (Das, 2020, Chang et al., 15 Jan 2025).

(a) Berry-Esseen Bound for Hyper-rectangles

Under the assumptions that each XijX_{ij} is mean-zero, with finite (2+κ)(2+\kappa)-th moment for some 0<κ10 < \kappa \leq 1, and the sequence is IID across ii, the following bound is obtained ((Das, 2020), Theorem 6):

ΔnCκ(logd)(2+κ)/2dn,κ\Delta_n \leq C_\kappa\, (\log d)^{(2+\kappa)/2}\, d_{n,\kappa}

where

dn,κ=σn(nBn,κ)1/(2+κ),σn2=1ni=1nVar(Xi1),Bn,κ=1ni=1nEXi12+κd_{n,\kappa} = \frac{\sigma_n}{(n B_{n,\kappa})^{1/(2+\kappa)}}, \qquad \sigma_n^2 = \frac{1}{n}\sum_{i=1}^n \mathrm{Var}(X_{i1}), \qquad B_{n,\kappa} = \frac{1}{n}\sum_{i=1}^n \mathbb{E}|X_{i1}|^{2+\kappa}

When both σn\sigma_n and Bn,κB_{n,\kappa} are bounded away from $0$ and \infty,

dn,κnκ/(2+κ),thusΔn=O((logd)(2+κ)/2nκ/2)d_{n,\kappa} \asymp n^{-\kappa/(2+\kappa)}, \quad \text{thus} \quad \Delta_n = O\left( (\log d)^{(2+\kappa)/2}\, n^{-\kappa/2} \right)

For the case κ=1\kappa=1 (finite third moment), the bound becomes

Δn=O((logd)3/2n)\Delta_n = O\left( \frac{(\log d)^{3/2}}{\sqrt{n}} \right)

which matches the classical rate n1/2n^{-1/2} for the univariate Berry-Esseen theorem, up to a logarithmic factor in dd.

(b) Berry-Esseen Bound for Maxima (Coordinatewise Maximum)

A complementary approach provides explicit, nonasymptotic bounds for the Kolmogorov distance between Tn\|T_n\|_\infty and its Gaussian counterpart (Chang et al., 15 Jan 2025). Assuming finite third absolute moments,

ΔnClog5/4(ed)n1/8(Emax1jdX1j/σj3)1/4\Delta_n \leq C\, \frac{\log^{5/4} (ed)}{ n^{1/8} } \left( \mathbb{E} \max_{1 \leq j \leq d} |X_{1j}/\sigma_j|^3 \right)^{1/4}

where the infimum is taken over all mean-zero dd-variate Gaussians with correlation matrices. The bound vanishes as nn \to \infty provided

logd=o(n1/10)\log d = o(n^{1/10})

A moment-matching version ΔnX\Delta_n^X controls the error for Gaussian approximations with the actual covariance of X1X_1.

3. Moment Assumptions and Dimension Growth

The fundamental trade-off in high-dimensional CLTs with self-normalized sums is between the required finite moment, the dimension dd, and the sample size nn. For the error bound to vanish, the growth of dimension is controlled by: $\log d = o \left( n^{\kappa/(2+\kappa)} \right) \quad \text{with finite %%%%26%%%%th moments [2012.03758]}$

logd=o(n1/10)with finite third moment [2501.08979]\log d = o(n^{1/10}) \quad \text{with finite third moment [2501.08979]}

For κ=1\kappa=1 (finite third moment), the regime logd=o(n1/2)\log d = o(n^{1/2}) is sufficient to vanishing error in the Berry-Esseen sense for uniform approximation over rectangles.

This is in contrast to non-self-normalized sums, which typically require only polylogarithmic dependence of dd on nn for uniform CLT results.

4. Core Proof Strategies

The derivation of Berry-Esseen bounds for self-normalized sums in high dimensions fundamentally departs from traditional approaches for sums of independent vectors.

Key steps include:

  • Componentwise reduction: Use independence of Xi1,,XidX_{i1},\dots,X_{id} (or factorization over rectangles) to reduce the multivariate problem to sums of one-dimensional bounds.
  • Refined Berry–Esseen for self-normalized sums: Deploy one-dimensional results of Jing–Shao–Wang (2003), Bentkus–Götze (1996), and Shao (2005) to control the error for self-normalized quantities.
  • Truncation and smoothing: Truncate coordinates to manage heavy tails and introduce a smooth surrogate for x/x2|x|/\sqrt{\sum x^2}, enabling Taylor expansion and smoothing arguments.
  • Gaussian anti-concentration: The logd\log d factors arise from multivariate Gaussian anti-concentration and complexity of the \ell_\infty-norm.
  • Balancing approximation and smoothing bias: Choose smoothing and truncation parameters to optimize the interplay between stochastic remainders and deterministic bias, establishing the explicit rates in nn and logd\log d.

A summary of the main proof ingredients and their quantitative contributions is provided in the following table:

Step Contribution to Bound Source
Truncation/linearization Controls P\mathbb{P} of large values (Chang et al., 15 Jan 2025)
Smoothing/Taylor expansion Contributes log5/4(d)\log^{5/4}(d) exponent (Chang et al., 15 Jan 2025)
One-dimensional BE bound Determines n1/8n^{-1/8} exponent (Chang et al., 15 Jan 2025)
Anti-concentration Further logd\log d growth in constants (Das, 2020)

For sums of independent vectors (without normalization), Berry-Esseen bounds of order polylog(d)n1/2\mathrm{polylog}(d)\, n^{-1/2} are attainable (Chernozhukov–Chetverikov–Kato, Kuchibhotla–Chakrabortty). Self-normalized statistics, however, exhibit fundamentally greater complexity: the normalization introduces high dependence and nonlinearity, precluding direct application of previous high-dimensional CLTs (Chang et al., 15 Jan 2025).

Earlier high-dimensional Berry-Esseen rates for self-normalized sums held only under exponential-moment or independence-across-(i,j)(i, j) assumptions. The new results (Das, 2020, Chang et al., 15 Jan 2025) relax these requirements to polynomial moments and accommodate arbitrary covariance structures (for maxima), providing the first explicit bounds in these regimes.

The bounds are also shown to be optimal in the sense that for κ=1\kappa=1, one cannot do better than (logd)3/2/n(\log d)^{3/2}/\sqrt{n} in general ((Das, 2020), Proposition 4.1).

6. Refined Bounds, Applications, and Future Directions

Stronger moment assumptions (e.g., finite fourth moment) or refined Lindeberg interpolations may reduce the logd\log d exponents and improve the n1/8n^{-1/8} rate to n1/6n^{-1/6}, though at the expense of analytical and technical complexity (Chang et al., 15 Jan 2025).

The truncation-based approach for moment-matching bounds controls errors even when coordinate variances diverge, offering robustness to heavy-tailed data distributions. The coordinatewise formulation directly informs statistical inference via Student's tt-statistic and the construction of simultaneous confidence intervals.

Extensions to dependent observations (e.g., mixing processes) remain an open problem.

7. Summary Table of Main Results

Reference Assumptions Bound Dimension Growth Regime
(Das, 2020) XijX_{ij} IID, EXij2+κ<\mathbb{E}|X_{ij}|^{2+\kappa}<\infty ΔnCκ(logd)(2+κ)/2dn,κ\Delta_n \leq C_\kappa (\log d)^{(2+\kappa)/2} d_{n,\kappa} logd=o(nκ/(2+κ))\log d = o(n^{\kappa/(2+\kappa)})
(Chang et al., 15 Jan 2025) XiX_i IID, EmaxjX1j/σj3<\mathbb{E} \max_j |X_{1j}/\sigma_j|^3 <\infty ΔnClog5/4(d)/n1/8\Delta_n \leq C\, \log^{5/4}(d)/n^{1/8} logd=o(n1/10)\log d = o(n^{1/10})

These results bridge the gap between classical Berry–Esseen theory and modern high-dimensional inference for self-normalized sums, providing explicit error rates and clarifying the interplay between moment control, dimensionality, and normalization.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Berry-Esseen Bound for High-dimensional Self-normalized Sums.