Papers
Topics
Authors
Recent
Search
2000 character limit reached

Geometric Median-of-Means Estimation

Updated 3 February 2026
  • Geometric Median-of-Means is a robust multivariate location estimator that combines the geometric median's resilience with the bias-variance balance of the median-of-means method.
  • It computes block means and uses their geometric median to achieve exponential concentration and sub-Gaussian error rates under minimal moment or tail assumptions.
  • The estimator naturally extends to high-dimensional, Banach, and metric spaces, ensuring robust performance even in heavy-tailed and adversarial noise environments.

The geometric median-of-means (GMOM) estimator is a robust multivariate location estimator that generalizes the classical univariate median-of-means principle to metric and Banach space valued data. It combines the resilience of the geometric median with the bias-variance trade-offs of the median-of-means, achieving exponentially concentrated error bounds under minimal moment or tail assumptions. The GMOM framework extends naturally to high-dimensional, Banach, and metric spaces, including settings with heavy-tailed or adversarial noise, and is accompanied by non-asymptotic statistical guarantees, efficient computation algorithms, and inference tools.

1. Construction of the Geometric Median-of-Means

Given independent observations Y1,,YNY_1, \ldots, Y_N in Rd\mathbb{R}^d (or in a general Banach or metric space), the GMOM estimator forms by partitioning the sample indices into kN/2k \leq N/2 disjoint blocks G1,,GkG_1, \ldots, G_k of size n=N/kn = \lfloor N/k \rfloor and computing the block means: Yˉj=1GjiGjYi,j=1,,k\bar{Y}_j = \frac{1}{|G_j|} \sum_{i\in G_j} Y_i, \quad j=1,\ldots,k The GMOM estimator of the mean μ=EY\mu = \mathbb{E}Y is the geometric median of the block means: μN:=med(Yˉ1,,Yˉk)=argminzRd1kj=1kzYˉj\mu_N := \operatorname{med}( \bar Y_1, \ldots, \bar Y_k ) = \arg\min_{z\in\mathbb{R}^d} \frac{1}{k} \sum_{j=1}^k \|z - \bar{Y}_j\| This extends to general metric spaces (M,d)(\mathcal{M}, d), where the estimator minimizes the maximum distance required to "defeat" more than half the block means, formalized through a tournament characterization (Minsker et al., 2023, Yun et al., 2022, Minsker, 2013).

2. Statistical Guarantees and Deviation Inequalities

GMOM achieves sub-Gaussian-type error rates and exponential tail bounds under weak moment or curvature assumptions. In Euclidean settings, with covariance ΣY\Sigma_Y, the following deterministic and probabilistic guarantees hold:

  • Bias Bound: For absolutely continuous YY with mean EY\mathbb{E}Y and geometric median m(PY)m(P_Y),

m(PY)EYmin{tr(ΣY),  O(ΣY)}\|m(P_Y) - \mathbb{E}Y\| \leq \min\left\{ \sqrt{\operatorname{tr}(\Sigma_Y)}, \; O(\sqrt{\|\Sigma_Y\|}) \right\}

Under log-concave or Gaussian distributions, the bias is O(ΣY)O(\sqrt{\|\Sigma_Y\|}), independent of dd (Minsker et al., 2023).

  • Finite-Sample Error Rate: For suitable kk and nn, with probability at least 14ek1-4 e^{-\sqrt{k}},

μNμC(tr(ΣY)N+ΣYkN)\|\mu_N - \mu\| \leq C \left( \sqrt{\frac{\operatorname{tr}(\Sigma_Y)}{N}} + \sqrt{\|\Sigma_Y\|} \sqrt{\frac{k}{N}} \right)

Choosing ktr(ΣY)/ΣYk \asymp \operatorname{tr}(\Sigma_Y)/\|\Sigma_Y\| yields nearly sub-Gaussian O(tr(ΣY)/N)O(\sqrt{\operatorname{tr}(\Sigma_Y)/N}) rates even under heavy tails (Minsker et al., 2023).

  • Banach Space Deviation: If kk weakly concentrated estimators μ^j\hat \mu_j exist with Pr{μ^jμ>ε}p<1/2\Pr\{\|\hat \mu_j-\mu\| > \varepsilon\} \leq p < 1/2, the GMOM satisfies

Pr{μ^GMOMμ>Cαε}exp(kψ(α;p))\Pr\{\|\hat \mu_{\mathrm{GMOM}} - \mu\| > C_\alpha \varepsilon\} \leq \exp(-k \psi(\alpha; p))

for any α(p,1/2)\alpha \in (p, 1/2), with explicit constants CαC_\alpha (Minsker, 2013).

  • Metric Space Extension: In non-positively curved (NPC/CAT(0)) spaces and under entropy conditions, the GMOM achieves

d(xMM,x)CqσXlog(1/δ)nd(x_{MM}, x^*) \leq C_q \sigma_X \sqrt{\frac{\log(1/\delta)}{n}}

with explicit CqC_q, depending only on the geometry and VC entropy of the metric space (Yun et al., 2022).

3. Algorithmic Considerations

  • Euclidean/ Hilbert Space: The geometric median is computed via Weiszfeld’s algorithm, defined iteratively by

z(t+1)=(j=1kxj/xjz(t))/(j=1k1/xjz(t))z^{(t+1)} = \left( \sum_{j=1}^k x_j/ \|x_j - z^{(t)}\| \right) / \left(\sum_{j=1}^k 1/ \|x_j - z^{(t)}\|\right)

with linear convergence under non-degeneracy (Minsker, 2013).

  • Quadratic Growth: The median objective

F(z)=1ki=1kzyiF(z) = \frac{1}{k} \sum_{i=1}^k \|z - y_i\|

satisfies a local quadratic growth condition. Specifically, for zz near the median mm,

F(z)F(m)a(r2/b3)F(z) - F(m) \gtrsim a (r^2 / b^3)

for explicit constants aa and bb, inducing sharp convergence criteria for first-order optimization methods (Minsker et al., 2023).

  • Stopping Rule: A practical criterion for halting gradient-based algorithms is

F(z)<aε/[2b2(ε+b)]    zm<ε\|\nabla F(z)\| < a\varepsilon / [2 b^2(\varepsilon + b)] \implies \|z-m\| < \varepsilon

guaranteeing ε\varepsilon-approximation of the geometric median (Minsker et al., 2023).

  • Banach/Metric Space: Subgradient or smoothing algorithms are applicable, and in general metric spaces, one may use minimum enclosing ball (centerpoint) and “median tournament” algorithms, but generic polynomial-time solvers are only available in Euclidean settings (Yun et al., 2022, Minsker, 2013).

4. High-Dimensional and Inference Framework

  • Bahadur Representation: For ultrahigh-dimensional pp with pexp(cn)p \leq \exp(c n) under sub-exponential tails and moment regularity, the spatial median (and thus GMOM of block means) admits an expansion

n(m^nθ0)=1ni=1nψ(Xi;θ0)+Rn\sqrt{n} (\hat m_n - \theta_0) = \frac{1}{\sqrt{n}} \sum_{i=1}^n \psi(X_i; \theta_0) + R_n

with supAArePr{n(m^nθ0)A}Pr{ZA}0\sup_{A\in\mathcal{A}^{\mathrm{re}}} | \Pr\{ \sqrt{n} (\hat m_n - \theta_0) \in A \} - \Pr\{Z\in A\}| \to 0 for rectangles AA, facilitating simultaneous confidence intervals and global testing (Cheng et al., 2023).

  • Multiplier Bootstrap: Introduces random sign multipliers ZiZ_i for empirical spatial medians, enabling law approximation and valid inferences in high-dimensions with no explicit estimation of variance structures (Cheng et al., 2023).
  • Multiple Testing: FDR-controlling procedures and global tests using coordinatewise statistics derived from the spatial or GMOM estimators are established under weak dependence and high-dimensional scaling (Cheng et al., 2023).

5. Extensions, Robustness, and Non-Euclidean Settings

GMOM applies to problems in Banach spaces and infinite-dimensional settings with only the requirement of weak block-level concentration. The construction admits finite adversarial contamination, maintaining exponential concentration by adjusting the effective kk (Minsker, 2013). In NPC metric spaces (e.g., manifolds, tree spaces), GMOM is defined as the center of the smallest ball defeating more than half the blocks, and achieves exponential deviation bounds and O(log(1/δ)/n)O(\sqrt{\log(1/\delta)/n}) error rates, in contrast to polynomial rates for empirical means (Yun et al., 2022).

The table below summarizes the generalization scope:

Space GMOM Definition Statistical Guarantee
Rd\mathbb{R}^d Geometric median of means Dimension-free, sub-Gaussian tails
Banach/Hilbert space Geometric median in XX Exponential deviation, O(1/n)O(1/\sqrt{n})
NPC/CAT(0) metric space Median-of-means tournament Exponential tail, explicit constants

Outlier robustness follows from the geometric median’s resistance properties; adversarial contamination in up to τ<(αp)/(1p)\tau<(α–p)/(1–p) fraction of blocks can be tolerated at bounded cost (Minsker, 2013). Parallelizability arises naturally from block autonomy.

6. Applications and Empirical Behavior

Practical applications include mean and location estimation under heavy-tailed distributions, sparse linear regression, low-rank matrix recovery, and empirical studies with real datasets such as log-returns of financial assets. GMOM achieves tighter risk bounds than the coordinatewise median or empirical mean in these regimes:

  • Empirical Performance: Synthetic high-dimensional data confirms predicted dimension dependence of local median curvature (O(d1/2)O(d^{-1/2})) (Minsker et al., 2023).
  • Financial Data: On S&P 500 daily log-returns, GMOM outperforms coordinatewise median and sample mean in both small-sample and heavy-tailed scenarios, with error bounds validated empirically (Minsker et al., 2023).
  • Statistical Inference: Large-scale genomic studies employ GMOM with multiplier bootstrap for simultaneous confidence interval construction and FDR-controlled multiple testing, demonstrating validity even as pp grows exponentially with nn (Cheng et al., 2023).

7. Limitations and Open Problems

Known computational limitations arise primarily in general metric spaces, where no fully polynomial-time implementation exists outside finite-dimensional Euclidean settings (Yun et al., 2022). Fast algorithms for the “median tournament” step in arbitrary NPC spaces remain an open problem. In Banach space extensions, controlling the constants in deviation inequalities and addressing non-unique medians for non-strictly convex norms are ongoing challenges (Minsker, 2013).

GMOM requires solving convex but non-smooth optimization problems—smoothed relaxations (e.g., Charbonnier loss) and tailored stopping rules mitigate computational difficulties (Minsker et al., 2023). In practice, block sizes and number, as well as weakest-block variance, must be chosen to satisfy requisite moment or concentration properties for the guarantees to hold.


Geometric median-of-means estimation offers a unifying framework for robust, high-dimensional, and distribution-free mean estimation, with broad applicability and theoretically grounded performance guarantees in both Euclidean and non-Euclidean domains (Minsker et al., 2023, Cheng et al., 2023, Minsker, 2013, Yun et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Geometric Median-of-Means (GMOM).