Papers
Topics
Authors
Recent
2000 character limit reached

Scatter Halfspace Depth in Multivariate Analysis

Updated 10 December 2025
  • Scatter halfspace depth is a robust, nonparametric measure that extends Tukey depth by assessing the centrality of positive-definite scatter matrices via slab counts in all directions.
  • It possesses key properties such as affine invariance, robustness against outliers, continuity, and minimax optimality under contamination models.
  • Recent advances offer both exact and approximate computational methods, with practical applications in robust multivariate estimation and financial outlier analysis.

Scatter halfspace depth (sHD) is a robust, nonparametric measure quantifying the fit of a positive-definite scatter (or covariance) matrix to a multivariate data cloud. Extending Tukey's location-based halfspace depth, sHD generalizes the concept of data "centrality" from points in Rd\mathbb{R}^d to the space of symmetric positive-definite matrices, providing a principled approach for multivariate scatter estimation with strong robustness and minimax-optimality properties. Recent work has developed both exact and approximate computational methods, clarified the geometry of depth regions, and explored its performance in both classical elliptically symmetric and more general α\alpha-symmetric models (Liu et al., 2022, Paindaveine et al., 2017, Bočinec et al., 8 Dec 2025).

1. Definition of Scatter Halfspace Depth

Let X1,,XnRdX_1, \dots, X_n \in \mathbb{R}^d be a sample, and fix a location center μRd\mu \in \mathbb{R}^d (typically the Tukey median). For any symmetric positive-definite matrix Σ0\Sigma \succeq 0, the scatter halfspace depth is defined as

sHD(Σ;μ)=minuSd1min{#{i:u(Xiμ)uΣu},  #{i:u(Xiμ)uΣu}}.\mathrm{sHD}(\Sigma; \mu) = \min_{u \in S^{d-1}} \min \left\{ \#\left\{i: |u^\top(X_i-\mu)| \le \sqrt{u^\top \Sigma u} \right\}, \; \#\left\{i: |u^\top(X_i-\mu)| \ge \sqrt{u^\top \Sigma u} \right\} \right\}.

Conceptually, sHD measures the smallest mass (number or probability) of data points in any slab delimited by two parallel hyperplanes orthogonal to uu and at Mahalanobis (ellipsoidal) distance uΣu\sqrt{u^\top \Sigma u} from μ\mu, across all directions uu. This definition extends to the population version using probabilities: H ⁣DP,Tsc(Σ)=infuSd1min{P(u(XTP)uΣu),  P(u(XTP)uΣu)}\mathrm{H}\!D^{\mathrm{sc}}_{P,T}(\Sigma) = \inf_{u \in S^{d-1}} \min \left\{ P\left( |u^\top (X - T_P)| \le \sqrt{u^\top \Sigma u} \right), \; P\left(|u^\top (X - T_P)| \ge \sqrt{u^\top \Sigma u} \right) \right\} where PP is the distribution of XX and TPT_P is an affine-equivariant location functional (Liu et al., 2022, Paindaveine et al., 2017).

When d=1d=1, sHD coincides with the univariate median absolute deviation depth ranking; for general dd, sHD can be understood as employing Tukey depth for the projections uu of the centered data cloud (Liu et al., 2022).

2. Statistical Properties and Theory

Core Properties

  • Affine Invariance: For any invertible transformation AA and shift bb, sHD(AΣA;b+Aμ)\mathrm{sHD}(A \Sigma A^\top; b + A\mu) computed from transformed data {AXi+b}\{AX_i+b\} equals sHD(Σ;μ)\mathrm{sHD}(\Sigma; \mu) on {Xi}\{X_i\} (Liu et al., 2022, Paindaveine et al., 2017, Bočinec et al., 8 Dec 2025).
  • Robustness: sHD downweights the influence of outlying observations; only scatter matrices aligned with the main data pattern achieve high depth (Liu et al., 2022).
  • Continuity and Quasiconcavity: For generic data in general position, ΣsHD(Σ)\Sigma \mapsto \mathrm{sHD}(\Sigma) is upper semicontinuous and sHD contours are quasiconcave/nested as depth varies. The population version exhibits upper semicontinuity under both the Frobenius and geodesic (Riemannian) metrics, and full continuity under smoothness assumptions on PP (Paindaveine et al., 2017).
  • Monotonicity and Boundedness: sHD decreases ("vanishes") at the boundary of the scatter parameter space as ΣF\|\Sigma\|_F \to \infty or smallest eigenvalue λmin(Σ)0\lambda_{\min}(\Sigma) \to 0 (Paindaveine et al., 2017).

Structural Insights

The deepest scatter matrix (sHD-median) exists for smooth distributions, and for elliptical models it is uniquely maximized at the true scatter (Paindaveine et al., 2017, Bočinec et al., 8 Dec 2025). For elliptical PP, the maximum depth is $1/2$; for other distributions, the supremum may be less. Depth regions (super-level sets) are convex and nested (Paindaveine et al., 2017, Bočinec et al., 8 Dec 2025).

Extensions to α\alpha-Symmetric Distributions

Recent developments introduce an α\alpha-symmetric scatter depth for distributions where the characteristic function has the form φX(t)=ϕ(tα)\varphi_X(t) = \phi(\|t\|_\alpha). The definition generalizes the Mahalanobis ellipsoid to the α\ell_\alpha norm: SDα(Σ;P)=infuSd1min{P(XT(P),uΣ1/2uα),P(XT(P),uΣ1/2uα)}\mathrm{SD}_\alpha(\Sigma; P) = \inf_{u \in S^{d-1}} \min \left\{ P\left( \langle X - T(P), u \rangle \le \| \Sigma^{1/2} u \|_\alpha \right), P\left( \langle X - T(P), u \rangle \ge \| \Sigma^{1/2} u \|_\alpha \right) \right\} Retaining affine equivariance and convexity, this version achieves minimax-optimality in broader distributional classes (Bočinec et al., 8 Dec 2025).

3. Robustness, Minimax Optimality, and Breakdown Properties

For μ\mu taken as the Tukey median, the sHD-median Σ\Sigma^* defined by maximizing sHD is a robust scatter estimator. Under the Huber ϵ\epsilon-contamination model—where data is a mixture of a “clean” distribution and arbitrary outliers—Σ\Sigma^* achieves minimax rate n1/2n^{-1/2} (up to constants) for estimation error, and exhibits a breakdown point approaching $1/2$. Thus, sHD attains both high resistance to gross errors and optimal error rates under contamination (Liu et al., 2022, Bočinec et al., 8 Dec 2025).

The optimality at the true scatter persists in elliptical models, with explicit characterizations and guarantees of existence and uniqueness. In α\alpha-symmetric families, the unique sHD-median is always isotropic: Σ=σ2Id\Sigma = \sigma^2 I_d for a scale σ\sigma determined by the marginal distribution, and minimax-optimal finite-sample error bounds have been established (Bočinec et al., 8 Dec 2025).

4. Computation: Exact and Approximate Algorithms

Exact sHD Computation

Exact sHD computation is combinatorially expensive but theoretically tractable for low dimensions. The key geometric concepts include:

  • Spherical Circles and Shells: Each point with Xi>1\|X_i\| > 1 defines the pair of spheres Ci={uSd1:uXi=1}C_i = \{u \in S^{d-1}: u^\top X_i = 1\} and its antipode, partitioning the sphere into regions where point counts are constant.
  • Maximal Tangent Hyperplanes: Only directions uu corresponding to maximal supporting hyperplanes require evaluation.
  • Algorithm Outline: (i) Affine-standardize the data, (ii) enumerate k-subsets corresponding to candidate tangent planes, (iii) detect those outside the unit ball, (iv) update sHD via point counts, (v) mark visited subsets to avoid redundancy.

This enumeration has worst-case O(ndd2)O(n^d d^2) complexity, but is practical for d5d\leq5 and moderate nn (Liu et al., 2022).

Fast Approximations

For larger dd or nn, approximate algorithms are necessary:

  • Random-Directions ("rdirections"): Sample NN random uu’s on Sd1S^{d-1}, evaluate the minimal point count slab for each, and take the minimum. Complexity is O(Nnd)O(Nnd).
  • Random-Points ("rpoints"): Randomly sample MM (d1)(d-1)-tuples of points, focusing only on the most likely critical slabs. More accurate at fixed computational budget, with complexity O(Mnd2)O(Mnd^2).

Empirical results show rpoints has near-exact accuracy \sim90% for reasonable MM (e.g., M=104M=10^4), while rdirections is less reliable (50% accuracy for comparable parameter settings) (Liu et al., 2022).

Implementation is available via C++ (for d5d\leq5) and R (arbitrary dd), with attention to numerical stability (projections differing by less than 101410^{-14} are carefully excluded) (Liu et al., 2022).

5. Geometry and Topology of Depth Regions

The set of all Σ\Sigma with sHD above a given threshold forms a depth region, denoted RP,Tsc(α)={Σ:H ⁣DP,Tsc(Σ)α}R^{\mathrm{sc}}_{P,T}(\alpha) = \{\Sigma: \mathrm{H}\!D^{\mathrm{sc}}_{P,T}(\Sigma) \ge \alpha\}. These regions are nested and convex in the Frobenius geometry, closed under mild regularity, and may be unbounded at the boundary of the space of scatter matrices.

Considering the geodesic Riemannian metric on the symmetric positive-definite cone confers additional structure: for smooth PP, geodesically deep regions are always compact for all but the minimal nonzero depth value. In elliptical models, depth is geodesically quasiconcave, and each region is geodesically convex (Paindaveine et al., 2017).

6. Applications and Empirical Behavior

Scatter halfspace depth underpins robust scatter estimation in high dimensions and in contaminated or heavy-tailed settings. In finance, practical utility is demonstrated via diagnostic analysis of scatter and shape outlyingness in high-frequency returns: sHD-based measures align with major financial events and outperform Euclidean distances in identifying both scale and shape outliers (Paindaveine et al., 2017).

For sample estimation, empirical sHD uniformly converges to its population counterpart under the empirical measure and affine-equivariant location functional, under mild conditions. The DepthDescent algorithm—an SPD-matrix descent scheme—handles dimensions up to k=10k=10 and sample sizes of several thousand (Paindaveine et al., 2017).

In the context of α\alpha-symmetric distributions, the unique sHD-median is isotropic and computable by solving a one-dimensional root-finding problem, with each step costing O(n)O(n). The estimator enjoys optimal concentration under Huber contamination, with estimation error matching minimax rates (Bočinec et al., 8 Dec 2025).

7. Historical Perspective and Relation to Other Depths

The sHD formalism generalizes ideas from classic Tukey depth (location) to scatter estimation and builds on early work by Zhang (2002) and Chen–Gao–Ren (2017), who proposed depth functionals for scatter with varying assumptions and properties. sHD distinguishes itself by its precise geometric definition, robust optimality, structural analysis of depth regions under multiple topologies, and generalization to α\alpha-symmetric laws (Paindaveine et al., 2017, Bočinec et al., 8 Dec 2025).

Alternative centering strategies (such as pairwise differences to avoid estimating a location parameter) can yield related depth notions, but fail to achieve Fisher consistency under general elliptical laws, highlighting the necessity of affine-equivariant centering (Paindaveine et al., 2017).

Table: Summary of Key Properties

Property sHD (Elliptical) sHD (α\alpha-Symmetric)
Affine Invariance Yes Yes
Uniqueness of Median Yes (elliptical) Yes (isotropic)
Robustness High, breakdown 1/2\sim1/2 High, breakdown 1/2\sim1/2
Minimax Optimality Yes (Huber contamination) Yes (by construction)
Region Convexity Frobenius and geodesic Convex (Frobenius), holds for depth
Efficient Computation Low dd via enumeration; Isotropic, 1D root-finding per sample
Large dd: stochastic

Scatter halfspace depth constitutes a foundational methodology for robust, nonparametric, affine-invariant scatter estimation in multivariate analysis, with rigorous guarantees and practical computational tools (Liu et al., 2022, Paindaveine et al., 2017, Bočinec et al., 8 Dec 2025).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Scatter Halfspace Depth.