Dice Question Streamline Icon: https://streamlinehq.com

Support characterization of the limiting spectral distribution for covariance matrices with arbitrary variance profiles

Determine the constant τ and characterize the support (including the lower and upper edges) of the limiting spectral distribution of the sample covariance matrix Σ_n = (1/n) X_n^T X_n in the high-dimensional limit where p and n grow proportionally, for data matrices X_n = Υ_n ◦ Z_n with independent centered unit-variance entries Z_ij and an arbitrary variance profile Γ_n = (γ_ij^2). Provide explicit bounds sufficient to evaluate T_p(0^−) and thereby enable analytic determination of the predictive risk of the minimum-norm least-squares estimator beyond the quasi doubly stochastic case.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper analyzes ridge regression in high dimensions when the predictors are independent but non-identically distributed, modeled via a variance profile X_n = Υ_n ◦ Z_n. Deterministic equivalents for the degrees of freedom and predictive risk are derived, and the minimum-norm least-squares case (λ → 0) depends on limits such as T_p(0−), which are tied to the spectral properties of Σ_n.

For quasi doubly stochastic variance profiles, the limiting spectral distribution reduces to the Marchenko–Pastur law, making its support and edge behavior explicit. In contrast, for arbitrary variance profiles, the authors note that analytic control requires understanding the constant τ and the support of the limiting spectral distribution of Σ_n, a problem that they state remains open. Resolving this would allow deriving analytic bounds for T_p(0−) (and related quantities) and thus fully characterize the predictive risk curves without relying solely on numerical methods.

References

Hence, upper and lower bounding T_p (-0) is related to understanding the value of the constant τ and the size of the support of the limiting spectral distribution of the covariance matrice Σ_n which remains (to the best of our knowledge) an open problem for random matrices with an arbitrary variance profile.

High-dimensional analysis of ridge regression for non-identically distributed data with a variance profile (2403.20200 - Bigot et al., 29 Mar 2024) in Section 3.2 (Predictive risk), paragraph following Corollary CorStat