High-Dimensional Data Analysis for Elliptically Symmetric Distributions

Published 15 Apr 2026 in stat.ME | (2604.13944v1)

Abstract: High-dimensional data arise routinely in modern statistics, econometrics, finance, genomics, and machine learning. While a large body of existing methodology is developed under Gaussian or light-tailed assumptions, many real data sets exhibit heavy tails, heterogeneity, and departures from classical covariance-based models. This book provides a systematic treatment of high-dimensional data analysis under elliptically symmetric distributions, with an emphasis on robust inference based on spatial signs, spatial ranks, multivariate Kendall's tau matrices, and related shape-based methods.The book covers the basic theory of elliptical symmetry, high-dimensional location inference, estimation and testing for covariance and precision matrices, sphericity and proportionality testing, high-dimensional alpha testing in factor pricing models, change-point analysis, white-noise and independence testing, high-dimensional discriminant analysis, and dimension reduction through principal component analysis and factor models. Throughout, we review classical low-dimensional and high-dimensional benchmark methods and then develop robust alternatives tailored to elliptical models. Particular attention is paid to the interplay between sum-type, max-type, and adaptive procedures, as well as to the role of scatter, shape, and rank-based dependence measures in heavy-tailed settings. This book is intended as a unified overview of robust high-dimensional methods under elliptical symmetry and as a synthesis of the author's recent research contributions in this area. It is written for researchers and graduate students in statistics, econometrics, and related fields who are interested in modern high-dimensional inference beyond the Gaussian paradigm.

Abstract PDF Upgrade to Chat

Authors (1)

Long Feng

Summary

The paper develops robust high-dimensional statistical methods for elliptically symmetric distributions that overcome limitations of covariance‐based inference.
The paper details spatial sign and rank-based estimators, including Tyler’s M‐estimator and weighted methods, to achieve semiparametrically optimal testing.
The paper demonstrates practical robustness in location testing, covariance estimation, and factor models under heavy-tailed noise and complex high-dimensional regimes.

High-Dimensional Data Analysis for Elliptically Symmetric Distributions: An Expert Synthesis

Overview and Motivation

This monograph offers a comprehensive, technically rigorous account of the theory and practice of high-dimensional statistics under the elliptical model. Elliptically symmetric distributions provide a flexible, mathematically tractable generalization of the multivariate normal, decoupling the geometric structure from tail assumptions and allowing principled inference in the presence of heavy tails, outliers, and latent heterogeneity. The text systematically develops the tools—spatial signs, ranks, robust shape functionals, and associated test statistics—necessary for contemporary applications where $p \gg n$ and distributional regularity conditions are weak or uncheckable.

The author foregrounds a geometric paradigm shift: inference is not based solely on covariance, which may not exist or be stable, but rather on a triad of location, scatter (or shape), and their directional structure. This orientation enables robust, efficient procedures for core multivariate problems: location testing, covariance/shape estimation, two-sample comparisons, classification, principal components, and factor models.

Foundations: Elliptical Distributions, Geometry, and Robust Functionals

The first chapter systematically formulates the elliptical family, introducing a unified framework that distinguishes between location, scatter, covariance, and normalized shape. The model $X = \mu + \xi A u$ (with uniform $u$ on the sphere and $\xi \geq 0$ independent) encapsulates Gaussian, multivariate $t$ , and normal mixture distributions. The principal object is the shape matrix $V = p S / \operatorname{tr}(S)$ , which encodes dispersion directions independently of scale.

Key robust building blocks include:

Spatial signs ( $U(x)$ ): Directional normalization, entirely insensitive to radial magnitude. Spatial sign covariance matrices (SSCM) retain eigen-directions of scatter while suppressing tail-related variability.
Spatial/rank functionals: Multivariate ranks leverage pairwise differences, fully translation-invariant yet robust to scale and tail issues.
Tyler's M-estimator: An affine-equivariant, scale-free estimator targeting the shape matrix irrespective of the existence of second moments, interpretable as an MLE under the Angular Central Gaussian (ACG) model. Self-consistent estimation equates the shape of spatial signs post-whitening to the identity.
Hettmansperger–Randles (HR) system: Joint estimation of location and shape via spatial-median and Tyler-like equations, achieving robust affine equivariance.

These robust procedures are shown to be not merely outlier-resistant, but semiparametrically optimal under broad elliptical models, underlining their theoretical centrality.

High-Dimensional Location Inference

The monograph thoroughly recasts classical multivariate location testing in a high-dimensional, elliptically robust context. After reviewing Hotelling’s $T^2$ and its degeneration as $p/n$ grows, the focus shifts to quadratic-form procedures made robust via diagonal standardization, leave-one/two-out centering, and metric-invariant statistics.

Key technical innovations are:

Diagonal and weighted spatial sign/rank-based statistics: Estimation equations for the spatial median and its scalar-invariant extension are derived, with their high-dimensional Bahadur expansions established under minimal moment assumptions. Notably, the asymptotic linear term depends only on the empirical mean of scaled signs, and remainder terms are explicitly controlled—a crucial property for subsequent max and sum tests.
Weighted location equations (including the inverse-norm weighting): The class $K(r)U(r)$ , with optimality (in the sense of local power under contiguous alternatives) achieved for $X = \mu + \xi A u$ 0, leading to the INST procedure. All such procedures are shown to yield efficient, moment-robust, and scalar-invariant inference.
Max-type and adaptive tests: To handle sparse alternatives, max statistics of the appropriately normalized components of spatial median estimators are studied. The asymptotic null distribution is shown to be Gumbel, with explicit conditions for weak dependence and technical results showing the negligible influence of centering and scale estimation on the limiting distribution.
Max-sum combinations and independence theorems: Recent results establish the asymptotic independence between sum-type (quadratic) and max-type test statistics under broad conditions, enabling the principled use of combination rules (like the Cauchy combination) to maintain sensitivity across both sparse and dense alternatives. These constructions mirror, in a robustly elliptic setting, the CLX and aSPU approaches developed under Gaussian benchmarks.

This framework results in procedures that are minimax-optimal or adaptively rate-optimal across a range of alternative regimes, with robustness to heavy-tailed noise and unknown nuisance parameters.

Matrix Inference: Covariance, Shape, Sphericity Tests, and Factor Models

The monograph’s next major axis is high-dimensional inference for covariance, scatter/shape, and their structural hypotheses:

Robust matrix estimation: Extends thresholding, graphical Lasso, and CLIME to operate on robustified (e.g., SSCM-based) analogues, achieving operator-norm rates comparable to covariance-based procedures but with substantially improved tail robustness and finite-sample properties.
Sphericity and proportionality tests: Classical likelihood ratio and trace-based tests are derived for fixed $X = \mu + \xi A u$ 1, but their breakdown in high-dimensions (particularly regarding the null distribution and invertibility) is detailed. The book instead develops sign- and rank-based sphericity tests that target the shape matrix rather than the covariance, with explicit bias corrections for the effect of center estimation, explicit formulas under high-dimensional asymptotics, and detailed local alternative analysis.
Two-sample structure: The natural null is proportionality of shapes, not covariance equality. Frobenius-norm SSCM statistics and spatial-rank analogues are proposed for testing these hypotheses, with fully worked out null distributions, bias, and efficiency analysis.
Elliptical factor models: The extension of PCA and factor analysis to robust, shape-based settings is elaborated, including Tyler-based and Kendall’s tau-based eigenspace estimation. Elliptical factor models are shown to permit accurate low-rank plus sparse decompositions without strong moment assumptions, maintaining theoretical guarantees analogous to POET and related light-tail methods.

Asymptotic Regimes, Technical Devices, and Robustness

A recurring theme is the technical handling of high-dimensional asymptotic regimes: sample and dimension may both diverge, often $X = \mu + \xi A u$ 2, requiring careful control of operator norms, trace ratios, and eigenstructure. The book establishes uniform control over bias terms induced by center and/or scale estimation through leave-one/two-out techniques, and provides precise technical conditions (on, e.g., eigenvalue delocalization, trace regularity, growth rates) for Gaussian approximations, Gumbel limits, or mixed Gaussian–chi-square limits to be valid.

Strong technical results include:

Explicit Bahadur expansions for spatial median and weighted/scale-invariant analogues
Uniform bias corrections for robust (nonlinear) covariance/shape/statistics
Full derivation of asymptotic independence results for sum/max-type tests—even with common underlying data and shared normalization constants
Provisions for non-Gaussian or strong-correlation nulls, including when classical random matrix/eigenvalue assumptions fail, and recourse to normal-reference or randomization methods.

Practical and Theoretical Implications

Practically, the robust tools developed are directly implementable and computationally feasible even in very high dimensions, with a planned R package to facilitate their adoption. Theoretically, the monograph offers a clear methodology for extending traditional multivariate methods (testing, estimation, dimension reduction, classification) into modern domains with heavy tails, unknown or ill-conditioned covariance, and complex alternative structure—without the need for Gaussian or even finite-moment assumptions.

The insight that shape, not raw covariance, is the primitive object in high dimensions under realistic data-generating processes has wide-reaching implications. The robust, theoretically sharp procedures in this regime open the door to reliable inference in genomics, finance, imaging, network data, and beyond.

The monograph also sets a clear agenda for future research:

Interplay between spatial sign/rank/Tyler estimators and sparsity-inducing penalization for extreme $X = \mu + \xi A u$ 3 settings
Extension and analysis of robust principal component and factor estimation under structured dependence (e.g., banded or block-diagonal $X = \mu + \xi A u$ 4)
Development of robust, adaptive combination strategies for wider classes of alternative hypotheses

Conclusion

This work synthesizes, extends, and systematizes robust high-dimensional inference under elliptical symmetry. It refines the technical apparatus to treat a broad collection of multivariate problems in a theoretically optimal and computationally tractable manner, tightly connecting robust statistics, semiparametric theory, and high-dimensional analysis. The monograph has both consolidated the field and provided a foundation for the next generation of research in robust multivariate statistics.

Markdown Report Issue