Berry-Esseen Bounds for High-D Self-Normalized Sums
- The paper establishes explicit Berry–Esseen bounds for self-normalized sums, achieving rates as fast as O((log d)^(3/2)/√n) under finite third moment conditions.
- The methodology employs truncation, smoothing, and Taylor expansions to manage nonlinearities introduced by data-dependent normalization in high dimensions.
- The results clarify the trade-off between moment assumptions, sample size, and dimension growth, optimizing convergence in multivariate statistical inference.
High-dimensional self-normalized sums arise in multivariate statistical inference, especially in cases where the dimensionality of observed random vectors grows with sample size. The Berry-Esseen bound quantifies the rate of convergence in the Central Limit Theorem (CLT), measuring how closely the distribution of a properly normalized sum approximates a Gaussian law. In high dimensions, the interplay between sample size, dimension, and moment assumptions becomes critical, particularly for self-normalized statistics, where scaling by the data-dependent standard deviation introduces strong dependencies and nonlinearities. Recent work establishes explicit Berry-Esseen type bounds for these self-normalized sums and their maxima, significantly advancing the understanding of high-dimensional CLTs under relaxed moment assumptions (Das, 2020, Chang et al., 15 Jan 2025).
1. Problem Formulation and Self-normalized Sums
Given a sequence of independent, identically distributed (IID), mean-zero random vectors , the primary object of interest is the self-normalized sum: for . For coordinate-wise inference, the distribution of is studied, as well as the uniform approximation of over classes of hyper-rectangles in .
The Berry-Esseen distance for evaluating the approximation to a multivariate normal is given by
where and denotes the class of hyper-rectangles.
2. Explicit High-dimensional Berry-Esseen Bounds
Recent results provide explicit Berry-Esseen bounds for the approximation of the law of (or its coordinatewise maximum) by an appropriate Gaussian distribution (Das, 2020, Chang et al., 15 Jan 2025).
(a) Berry-Esseen Bound for Hyper-rectangles
Under the assumptions that each is mean-zero, with finite -th moment for some , and the sequence is IID across , the following bound is obtained ((Das, 2020), Theorem 6):
where
When both and are bounded away from $0$ and ,
For the case (finite third moment), the bound becomes
which matches the classical rate for the univariate Berry-Esseen theorem, up to a logarithmic factor in .
(b) Berry-Esseen Bound for Maxima (Coordinatewise Maximum)
A complementary approach provides explicit, nonasymptotic bounds for the Kolmogorov distance between and its Gaussian counterpart (Chang et al., 15 Jan 2025). Assuming finite third absolute moments,
where the infimum is taken over all mean-zero -variate Gaussians with correlation matrices. The bound vanishes as provided
A moment-matching version controls the error for Gaussian approximations with the actual covariance of .
3. Moment Assumptions and Dimension Growth
The fundamental trade-off in high-dimensional CLTs with self-normalized sums is between the required finite moment, the dimension , and the sample size . For the error bound to vanish, the growth of dimension is controlled by: $\log d = o \left( n^{\kappa/(2+\kappa)} \right) \quad \text{with finite %%%%26%%%%th moments [2012.03758]}$
For (finite third moment), the regime is sufficient to vanishing error in the Berry-Esseen sense for uniform approximation over rectangles.
This is in contrast to non-self-normalized sums, which typically require only polylogarithmic dependence of on for uniform CLT results.
4. Core Proof Strategies
The derivation of Berry-Esseen bounds for self-normalized sums in high dimensions fundamentally departs from traditional approaches for sums of independent vectors.
Key steps include:
- Componentwise reduction: Use independence of (or factorization over rectangles) to reduce the multivariate problem to sums of one-dimensional bounds.
- Refined Berry–Esseen for self-normalized sums: Deploy one-dimensional results of Jing–Shao–Wang (2003), Bentkus–Götze (1996), and Shao (2005) to control the error for self-normalized quantities.
- Truncation and smoothing: Truncate coordinates to manage heavy tails and introduce a smooth surrogate for , enabling Taylor expansion and smoothing arguments.
- Gaussian anti-concentration: The factors arise from multivariate Gaussian anti-concentration and complexity of the -norm.
- Balancing approximation and smoothing bias: Choose smoothing and truncation parameters to optimize the interplay between stochastic remainders and deterministic bias, establishing the explicit rates in and .
A summary of the main proof ingredients and their quantitative contributions is provided in the following table:
| Step | Contribution to Bound | Source |
|---|---|---|
| Truncation/linearization | Controls of large values | (Chang et al., 15 Jan 2025) |
| Smoothing/Taylor expansion | Contributes exponent | (Chang et al., 15 Jan 2025) |
| One-dimensional BE bound | Determines exponent | (Chang et al., 15 Jan 2025) |
| Anti-concentration | Further growth in constants | (Das, 2020) |
5. Comparison to Prior and Related Results
For sums of independent vectors (without normalization), Berry-Esseen bounds of order are attainable (Chernozhukov–Chetverikov–Kato, Kuchibhotla–Chakrabortty). Self-normalized statistics, however, exhibit fundamentally greater complexity: the normalization introduces high dependence and nonlinearity, precluding direct application of previous high-dimensional CLTs (Chang et al., 15 Jan 2025).
Earlier high-dimensional Berry-Esseen rates for self-normalized sums held only under exponential-moment or independence-across- assumptions. The new results (Das, 2020, Chang et al., 15 Jan 2025) relax these requirements to polynomial moments and accommodate arbitrary covariance structures (for maxima), providing the first explicit bounds in these regimes.
The bounds are also shown to be optimal in the sense that for , one cannot do better than in general ((Das, 2020), Proposition 4.1).
6. Refined Bounds, Applications, and Future Directions
Stronger moment assumptions (e.g., finite fourth moment) or refined Lindeberg interpolations may reduce the exponents and improve the rate to , though at the expense of analytical and technical complexity (Chang et al., 15 Jan 2025).
The truncation-based approach for moment-matching bounds controls errors even when coordinate variances diverge, offering robustness to heavy-tailed data distributions. The coordinatewise formulation directly informs statistical inference via Student's -statistic and the construction of simultaneous confidence intervals.
Extensions to dependent observations (e.g., mixing processes) remain an open problem.
7. Summary Table of Main Results
| Reference | Assumptions | Bound | Dimension Growth Regime |
|---|---|---|---|
| (Das, 2020) | IID, | ||
| (Chang et al., 15 Jan 2025) | IID, |
These results bridge the gap between classical Berry–Esseen theory and modern high-dimensional inference for self-normalized sums, providing explicit error rates and clarifying the interplay between moment control, dimensionality, and normalization.