Scatter Halfspace Depth in Multivariate Analysis
- Scatter halfspace depth is a robust, nonparametric measure that extends Tukey depth by assessing the centrality of positive-definite scatter matrices via slab counts in all directions.
- It possesses key properties such as affine invariance, robustness against outliers, continuity, and minimax optimality under contamination models.
- Recent advances offer both exact and approximate computational methods, with practical applications in robust multivariate estimation and financial outlier analysis.
Scatter halfspace depth (sHD) is a robust, nonparametric measure quantifying the fit of a positive-definite scatter (or covariance) matrix to a multivariate data cloud. Extending Tukey's location-based halfspace depth, sHD generalizes the concept of data "centrality" from points in to the space of symmetric positive-definite matrices, providing a principled approach for multivariate scatter estimation with strong robustness and minimax-optimality properties. Recent work has developed both exact and approximate computational methods, clarified the geometry of depth regions, and explored its performance in both classical elliptically symmetric and more general -symmetric models (Liu et al., 2022, Paindaveine et al., 2017, Bočinec et al., 8 Dec 2025).
1. Definition of Scatter Halfspace Depth
Let be a sample, and fix a location center (typically the Tukey median). For any symmetric positive-definite matrix , the scatter halfspace depth is defined as
Conceptually, sHD measures the smallest mass (number or probability) of data points in any slab delimited by two parallel hyperplanes orthogonal to and at Mahalanobis (ellipsoidal) distance from , across all directions . This definition extends to the population version using probabilities: where is the distribution of and is an affine-equivariant location functional (Liu et al., 2022, Paindaveine et al., 2017).
When , sHD coincides with the univariate median absolute deviation depth ranking; for general , sHD can be understood as employing Tukey depth for the projections of the centered data cloud (Liu et al., 2022).
2. Statistical Properties and Theory
Core Properties
- Affine Invariance: For any invertible transformation and shift , computed from transformed data equals on (Liu et al., 2022, Paindaveine et al., 2017, Bočinec et al., 8 Dec 2025).
- Robustness: sHD downweights the influence of outlying observations; only scatter matrices aligned with the main data pattern achieve high depth (Liu et al., 2022).
- Continuity and Quasiconcavity: For generic data in general position, is upper semicontinuous and sHD contours are quasiconcave/nested as depth varies. The population version exhibits upper semicontinuity under both the Frobenius and geodesic (Riemannian) metrics, and full continuity under smoothness assumptions on (Paindaveine et al., 2017).
- Monotonicity and Boundedness: sHD decreases ("vanishes") at the boundary of the scatter parameter space as or smallest eigenvalue (Paindaveine et al., 2017).
Structural Insights
The deepest scatter matrix (sHD-median) exists for smooth distributions, and for elliptical models it is uniquely maximized at the true scatter (Paindaveine et al., 2017, Bočinec et al., 8 Dec 2025). For elliptical , the maximum depth is $1/2$; for other distributions, the supremum may be less. Depth regions (super-level sets) are convex and nested (Paindaveine et al., 2017, Bočinec et al., 8 Dec 2025).
Extensions to -Symmetric Distributions
Recent developments introduce an -symmetric scatter depth for distributions where the characteristic function has the form . The definition generalizes the Mahalanobis ellipsoid to the norm: Retaining affine equivariance and convexity, this version achieves minimax-optimality in broader distributional classes (Bočinec et al., 8 Dec 2025).
3. Robustness, Minimax Optimality, and Breakdown Properties
For taken as the Tukey median, the sHD-median defined by maximizing sHD is a robust scatter estimator. Under the Huber -contamination model—where data is a mixture of a “clean” distribution and arbitrary outliers— achieves minimax rate (up to constants) for estimation error, and exhibits a breakdown point approaching $1/2$. Thus, sHD attains both high resistance to gross errors and optimal error rates under contamination (Liu et al., 2022, Bočinec et al., 8 Dec 2025).
The optimality at the true scatter persists in elliptical models, with explicit characterizations and guarantees of existence and uniqueness. In -symmetric families, the unique sHD-median is always isotropic: for a scale determined by the marginal distribution, and minimax-optimal finite-sample error bounds have been established (Bočinec et al., 8 Dec 2025).
4. Computation: Exact and Approximate Algorithms
Exact sHD Computation
Exact sHD computation is combinatorially expensive but theoretically tractable for low dimensions. The key geometric concepts include:
- Spherical Circles and Shells: Each point with defines the pair of spheres and its antipode, partitioning the sphere into regions where point counts are constant.
- Maximal Tangent Hyperplanes: Only directions corresponding to maximal supporting hyperplanes require evaluation.
- Algorithm Outline: (i) Affine-standardize the data, (ii) enumerate k-subsets corresponding to candidate tangent planes, (iii) detect those outside the unit ball, (iv) update sHD via point counts, (v) mark visited subsets to avoid redundancy.
This enumeration has worst-case complexity, but is practical for and moderate (Liu et al., 2022).
Fast Approximations
For larger or , approximate algorithms are necessary:
- Random-Directions ("rdirections"): Sample random ’s on , evaluate the minimal point count slab for each, and take the minimum. Complexity is .
- Random-Points ("rpoints"): Randomly sample -tuples of points, focusing only on the most likely critical slabs. More accurate at fixed computational budget, with complexity .
Empirical results show rpoints has near-exact accuracy 90% for reasonable (e.g., ), while rdirections is less reliable (50% accuracy for comparable parameter settings) (Liu et al., 2022).
Implementation is available via C++ (for ) and R (arbitrary ), with attention to numerical stability (projections differing by less than are carefully excluded) (Liu et al., 2022).
5. Geometry and Topology of Depth Regions
The set of all with sHD above a given threshold forms a depth region, denoted . These regions are nested and convex in the Frobenius geometry, closed under mild regularity, and may be unbounded at the boundary of the space of scatter matrices.
Considering the geodesic Riemannian metric on the symmetric positive-definite cone confers additional structure: for smooth , geodesically deep regions are always compact for all but the minimal nonzero depth value. In elliptical models, depth is geodesically quasiconcave, and each region is geodesically convex (Paindaveine et al., 2017).
6. Applications and Empirical Behavior
Scatter halfspace depth underpins robust scatter estimation in high dimensions and in contaminated or heavy-tailed settings. In finance, practical utility is demonstrated via diagnostic analysis of scatter and shape outlyingness in high-frequency returns: sHD-based measures align with major financial events and outperform Euclidean distances in identifying both scale and shape outliers (Paindaveine et al., 2017).
For sample estimation, empirical sHD uniformly converges to its population counterpart under the empirical measure and affine-equivariant location functional, under mild conditions. The DepthDescent algorithm—an SPD-matrix descent scheme—handles dimensions up to and sample sizes of several thousand (Paindaveine et al., 2017).
In the context of -symmetric distributions, the unique sHD-median is isotropic and computable by solving a one-dimensional root-finding problem, with each step costing . The estimator enjoys optimal concentration under Huber contamination, with estimation error matching minimax rates (Bočinec et al., 8 Dec 2025).
7. Historical Perspective and Relation to Other Depths
The sHD formalism generalizes ideas from classic Tukey depth (location) to scatter estimation and builds on early work by Zhang (2002) and Chen–Gao–Ren (2017), who proposed depth functionals for scatter with varying assumptions and properties. sHD distinguishes itself by its precise geometric definition, robust optimality, structural analysis of depth regions under multiple topologies, and generalization to -symmetric laws (Paindaveine et al., 2017, Bočinec et al., 8 Dec 2025).
Alternative centering strategies (such as pairwise differences to avoid estimating a location parameter) can yield related depth notions, but fail to achieve Fisher consistency under general elliptical laws, highlighting the necessity of affine-equivariant centering (Paindaveine et al., 2017).
Table: Summary of Key Properties
| Property | sHD (Elliptical) | sHD (-Symmetric) |
|---|---|---|
| Affine Invariance | Yes | Yes |
| Uniqueness of Median | Yes (elliptical) | Yes (isotropic) |
| Robustness | High, breakdown | High, breakdown |
| Minimax Optimality | Yes (Huber contamination) | Yes (by construction) |
| Region Convexity | Frobenius and geodesic | Convex (Frobenius), holds for depth |
| Efficient Computation | Low via enumeration; | Isotropic, 1D root-finding per sample |
| Large : stochastic |
Scatter halfspace depth constitutes a foundational methodology for robust, nonparametric, affine-invariant scatter estimation in multivariate analysis, with rigorous guarantees and practical computational tools (Liu et al., 2022, Paindaveine et al., 2017, Bočinec et al., 8 Dec 2025).