Tukey Median in Multivariate Analysis
- The location halfspace median is a robust, affine-equivariant estimator defined as the maximizer of halfspace depth, optimizing the minimal probability mass over all closed halfspaces.
- It leverages convex geometry and statistical depth to achieve an asymptotic breakdown point up to 1/3 and uniform convergence rates under mild conditions.
- Practical insights include exact bivariate algorithms and scalable approximate methods for high-dimensional data, ensuring applicability in robust multivariate inference.
The location halfspace median, universally referred to as the Tukey median, is a canonical multivariate center estimator grounded in the concept of statistical depth. Defined via the minimum probability mass over all closed halfspaces containing a given point, it generalizes the univariate median to arbitrary dimension. The Tukey median is the maximizer(s) of halfspace depth for a given measure or data set, yielding a robust, affine-equivariant center with rich theoretical and computational properties. This article develops the definition, key structural results, robustness metrics, algorithmic techniques, and statistical guarantees associated with the location halfspace median, integrating contemporary advances for high-dimensional inference.
1. Formal Definition and Basic Structural Properties
Let be a data sample, with empirical measure . The halfspace depth of is defined as
Equivalently, in the empirical case,
The location halfspace median set is the set of maximizers
often referred to as or . In general, may not be a singleton; it is always a closed (possibly lower-dimensional) convex polytope (Dutta et al., 2012, Mosler, 2012).
Central (level) regions are convex and nested in , with the deepest (non-empty) region serving as the Tukey median set.
2. Geometric, Convex Analytic, and Covering Characterizations
Halfspace depth regions admit multiple geometric representations. For the population measure, coincides with
- Intersection of all closed halfspaces of mass at least ;
- “Floating body” at level : intersection of all halfspaces whose complements carry mass at most (Laketa et al., 2021);
- For empirical measures (samples not in general position), depth regions are finite intersections of “irrotatable” halfspaces: each facet lies in a hyperplane spanned by at least data points, with rotation about any -face increasing the cut-off count (Liu et al., 2016).
A salient result is the “ray basis theorem,” stating that the union of minimal-mass halfspaces at the median covers the entire space if and only if the point is indeed a median. For smooth measures, this gives a geometric covering property for every median point (Laketa et al., 2021).
Recent advances introduced flag halfspaces—nested, open-in-face sets—showing that, even for atomic or non-smooth measures, the mass minimizing halfspace in the definition of depth is always attained in a flag halfspace, unifying the theory for both continuous and discrete cases (Pokorný et al., 2022).
3. Robustness: Breakdown Point and Affine Equivariance
The finite-sample breakdown point is the maximal fraction of contamination that an estimator tolerates before it can be made arbitrarily large. For the Tukey median, the breakdown point under the additive contamination (Huber) model satisfies
where is the deepest sample depth (Liu et al., 2016, Liu et al., 2016). Under halfspace symmetry and weak smoothness, the asymptotic breakdown approaches $1/3$:
For data in general position, the breakdown is at least , which is maximal among affine-equivariant location estimators (Mosler, 2012). Under total variation contamination, the breakdown point is $1/4$ in high dimensions but remains $1/3$ in (Zhu et al., 2020).
Affine equivariance is intrinsic: any invertible affine map and carries medians to medians, with
and thus
guaranteeing coordinate-free centrality (Mosler, 2012, Dutta et al., 2012, Dai et al., 2021).
4. Computational Methodologies
Computation of the Tukey median is NP-hard in general for , but tractable and efficient algorithms exist for small and moderate dimensions:
- Exact bivariate algorithm: via rotating-calipers or “circular sequence” on pairwise lines (Mosler, 2012).
- Exact general- algorithms: Polytope intersection via enumeration of candidate halfspace facets, (Mosler, 2012, Liu et al., 2016).
- Approximate algorithms:
- Random Tukey: Sample random directions, estimate depth as minimum among observed halfspace fractions, (Mosler, 2012).
- ABCDepth: Ball-intersection representation (Merkle equivalence): approximate level sets as intersections of balls containing at least data points, yielding median computation complexity linear in and quadratic in (Bogićević et al., 2018).
- Metric halfspace depth: Generic anchor set , compute , approximate depth at candidate by the minimal where ; total complexity for candidates (Dai et al., 2021).
Modern improvements leverage sharp upper bounds on maximum sample depth, such as for data in general position ( dimension), avoiding redundant depth region searches and reducing computational burden in large regimes (Liu et al., 2016).
5. Statistical Properties: Consistency, Efficiency, Convergence Rates
Halfspace median estimators inherit uniform convergence from the finite VC-dimension of halfspace families:
yielding
in Hausdorff metric and, if the population median is unique, sample medians converge in probability (Mosler, 2012, Dutta et al., 2012).
In elliptical and -symmetric models (characteristic function depends on ), the Tukey median achieves minimax-optimal convergence rates under contamination:
in the dual norm , for contamination level and large (Bočinec et al., 8 Dec 2025). No moment or bounded support assumptions are required. For sub-Gaussian or heavy-tailed models, estimation error bounds adapt via the appropriate decay function .
Breakdown point coincides with theoretical upper limits: $1/3$ under symmetry for additive contamination, $1/4$ for total variation. The halfspace-metric projection estimator can push breakdown to $1/2$ if the depth decay function is known, but forfeits affine invariance (Zhu et al., 2020).
6. Topological, Dimensional, and General Position Results
Sample median regions are closed convex polytopes; under general position they cannot be -dimensional except in degenerate settings () (Pokorný et al., 2022). In , the median set is either a polygon (full-dimensional) or a singleton coinciding with a sample point.
If the data are not in general position, depth regions are still finite intersections of “irrotatable” halfspaces; non-uniqueness of the median is possible, but the average of deepest points provides a canonical center estimate (Liu et al., 2016).
In infinite-dimensional Banach or Hilbert spaces, halfspace depth degenerates: for most , , and the median at the center remains at depth $1/2$ (Dutta et al., 2012).
7. Practical Recommendations and Applications
- For robust multivariate location, the Tukey median provides a nonparametric, affine-equivariant center with breakdown point optimal among affine-equivariant estimators (Mosler, 2012, Dutta et al., 2012).
- In high dimensions, employ approximate algorithms (e.g., ABCDepth) that scale linearly with and quadratically with (Bogićević et al., 2018).
- In , exploit the fact that TukeyRegion package computes exact medians in all empirical cases due to dimensionality results (Pokorný et al., 2022).
- For robust inference under contamination, utilize depth-based central regions; in -symmetric models, finite-sample bounds follow the minimax rate (Bočinec et al., 8 Dec 2025).
- Outlier detection, depth-based classification, symmetry diagnostics, and nonparametric confidence regions are all supported foundationally by the halfspace median and depth central regions (Dutta et al., 2012).
Summary Table: Tukey Median Core Properties
| Property | Mathematical Formulation | Reference |
|---|---|---|
| Definition | (Mosler, 2012, Dutta et al., 2012) | |
| Median region | (Mosler, 2012) | |
| Breakdown point | $1/(d+1)$, up to $1/3$ asymptotically under symmetry | (Liu et al., 2016, Mosler, 2012) |
| Approximate algorithm | Ball intersection, complexity | (Bogićević et al., 2018) |
| Statistical consistency | uniform convergence under VC theory | (Mosler, 2012, Bočinec et al., 8 Dec 2025) |
In contemporary multivariate analysis, the location halfspace median stands as a foundational tool for robust, affine-invariant estimation, undergirded by rigorous depth theory, convex geometry, and scalable computation.