Papers
Topics
Authors
Recent
2000 character limit reached

Median-Of-Means Tournaments

Updated 12 December 2025
  • Median-of-means tournaments are robust statistical procedures that partition data into blocks and use pairwise tournaments to select optimal estimators.
  • They extend classical methods to heavy-tailed, high-dimensional, and non-Euclidean settings by decoupling estimation and concentration.
  • These methods achieve exponential tail bounds and attain minimax rates, offering high-confidence performance even with significant outlier contamination.

Median-of-means tournaments are a class of robust statistical learning procedures that achieve optimal accuracy–confidence tradeoffs under minimal moment assumptions. They operate by partitioning data into blocks, constructing blockwise estimators (means or empirical losses), and resolving pairwise comparisons or "tournaments" via majority or median rules. This framework decouples estimation and concentration, attaining exponential deviation bounds even in heavy-tailed or non-Euclidean settings, and extends naturally to regularized estimation and various loss function structures, including U-statistics and metric spaces beyond Euclidean geometry (Lugosi et al., 2016, Lugosi et al., 2017, Yun et al., 2022, Lugosi et al., 2017, Laforgue et al., 2022).

1. Core Principles and Classical Framework

The classical median-of-means tournament is defined as follows: Given nn i.i.d. observations X1,,XnX_1, \ldots, X_n in RD\mathbb{R}^D, the data are split into kk disjoint blocks B1,,Bk\mathcal{B}_1, \ldots, \mathcal{B}_k, each of size m=n/km = \lfloor n / k \rfloor. The block means are computed: Zj=1mXiBjXi,j=1,,k.Z_j = \frac{1}{m} \sum_{X_i \in \mathcal{B}_j} X_i, \quad j=1, \ldots, k. For any candidate estimators a,bRDa, b \in \mathbb{R}^D, aa is said to defeat bb if it is closer to ZjZ_j than bb in a majority of blocks; i.e., #{j:aZjbZj}>k/2\#\{j: \|a - Z_j\| \leq \|b - Z_j\|\} > k/2. The "defeating region" SxS_x of xx consists of those aa that defeat xx; its smallest containing radius rxr_x over xx is the "defeating radius." The tournament estimator is

x^MMargminxRDrx,\hat{x}_{MM} \in \arg\min_{x \in \mathbb{R}^D} r_x,

which, in effect, implements a geometric median of the block means via pairwise tournaments (Yun et al., 2022).

This approach generalizes to function estimation and risk minimization in L2L_2 spaces and convex or hierarchical function classes, underpinning a suite of procedures for statistical learning with only second-moment assumptions (Lugosi et al., 2016, Lugosi et al., 2017).

2. Median-of-Means Tournaments in Risk Minimization

In regression and machine learning, median-of-means tournaments provide a mechanism for selecting a predictor f^\hat{f} from a class FL2(μ)\mathcal{F} \subset L_2(\mu) so that the excess risk R(f^)R(f)R(\hat{f}) - R(f^*) is minimized with high confidence. Here, R(f)=E[(f(X)Y)2]R(f) = \mathbb{E}[(f(X) - Y)^2] and ff^* denotes the minimizer.

The procedure typically involves three phases:

  1. Distance Oracle: On a subsample, estimate the L1L_1 or L2L_2 distance between candidate functions robustly via median-of-means, ensuring only sufficiently well-separated pairs are compared.
  2. Preliminary Round (Elimination): On disjoint data, conduct blockwise tournaments comparing empirical risks. "Matches" are decided by majority, and only predictors unbeaten in all allowed duels advance.
  3. Champions League (Final Selection): Another independent split is used; blockwise comparisons further restrict admissible predictors, producing the final estimator (Lugosi et al., 2016, Laforgue et al., 2022).

This design ensures that, with high probability (exponentially small failure), the returned f^\hat{f} is nearly optimal. The approach is robust to heavy tails and outliers: median blockwise aggregation ensures that a constant fraction of contaminated blocks does not affect the final outcome (Lugosi et al., 2016, Lugosi et al., 2017, Lugosi et al., 2017).

3. Extensions: Regularization and High-Dimensional Problems

Median-of-means tournaments extend naturally to incorporate regularization and structural penalties:

  • Tournament LASSO and SLOPE: For high-dimensional settings, the procedure is applied hierarchically over classes defined by 1\ell_1- or sorted 1\ell_1-balls (SLOPE). At each level, a regularization term is built into the blockwise match comparisons, with the penalty parameter chosen to balance blockwise variance and regularization bias. The procedure selects the most complex class in which the target function survives all rounds.
  • Guarantees: Under only fourth-moment (or sometimes only second-moment) assumptions, the regularized tournament estimators match the minimax rates known from sub-Gaussian theory but with high-confidence exponential deviation:

P(t^t02>Cσ(slog(ed/s))/N)2exp(cNmin{1,(slog(ed/s))/N})P\left( \|\hat{t} - t_0\|_2 > C \sigma \sqrt{(s \log (ed/s))/N} \right) \leq 2 \exp( -c N \min\{1, (s \log (ed/s))/N\} )

where t0t_0 is the true parameter, and ss its (approximate) sparsity (Lugosi et al., 2017).

A four-phase adaptation for regularization incorporates a "distance oracle," "elimination," "champions league," and a final selection step across subset hierarchies (Lugosi et al., 2017).

4. Generalizations: Metrics, U-Statistics, and Randomized Blocks

The median-of-means tournament framework generalizes beyond Euclidean spaces and single-sample losses:

  • General Metric Spaces and Non-Euclidean Geometry: The tournament estimator is defined on general Polish metric spaces (M,d)(\mathcal{M}, d). Instead of means, empirical Fréchet means and metric-based losses η(x,y)=d(x,y)2\eta(x, y) = d(x, y)^2 are used. Exponential deviation inequalities are established under mild "quadruple" and "variance" inequalities linked to the space’s curvature. The framework applies notably to non-positive curvature (NPC) spaces (Yun et al., 2022).
  • Pairwise and U-Statistic Losses: For ranking, metric learning, or clustering, risk is given by R(f)=E[(f;Xi,Xj)]R(f) = \mathbb{E}[\ell(f; X_i, X_j)]. The empirical estimate is a UU-statistic; tournaments and blockwise medians of UU-statistics, including randomized blocks (sampling without replacement), retain concentration and robustness properties. Key deviation bounds are provided for both median-of-means variants and their extensions to U-statistics (Laforgue et al., 2022).
  • Randomization in Block Formation: Classical MoM requires fixed-size, partitioned blocks. Randomized blocks formed via SRSWoR (simple random sampling without replacement) decouple block count and block size, preserving concentration inequalities even when blocks are created via reshuffling, enabling practical parallelization and stochastic optimization (Laforgue et al., 2022).

5. Statistical Risk, Robustness, and Accuracy–Confidence Tradeoffs

A definitive property of median-of-means tournaments is their attainment of optimal accuracy–confidence tradeoffs under minimal tail assumptions:

  • Exponential Tail Bounds: For heavy-tailed data, MoM tournament estimators satisfy

P(Estimation error>r)exp(cNmin{1,(r/σ)2})P \left( \text{Estimation error} > r \right ) \leq \exp\left( -c N \min\{1, (r/\sigma)^2 \} \right)

significantly outperforming Chebyshev-type polynomial bounds provided by empirical mean or empirical risk minimization (ERM) (Lugosi et al., 2016, Yun et al., 2022).

  • Sub-Gaussian Concentration: Even under only second-moment (finite variance) conditions, the deviation rates can scale as O(log(1/δ)/n)O(\sqrt{\log(1/\delta)/n}) with probability 1δ1 - \delta, matching sub-Gaussian estimators under high confidence (Laforgue et al., 2022, Yun et al., 2022).
  • Robustness to Outliers: As the decision rules are based on medians over blocks, a constant fraction of corrupted or arbitrarily contaminated blocks (e.g., up to 25%25\%) does not affect the selection—guaranteeing tolerance to adversarial contamination (Lugosi et al., 2017, Lugosi et al., 2016).

6. Computational and Practical Considerations

The main practical limitation is algorithmic: Exact median-of-means tournaments involve F2|F|^2 pairwise matches per block, which is infeasible for large or infinite function classes. Although the "max–median" or "minimax" reduction yields a convex-concave saddlepoint problem for convex FF, the existence of efficient algorithms guaranteeing the same statistical risk–confidence optimality remains an open problem (Lugosi et al., 2017).

For large-scale applications, randomized or stochastic block selection and use of incomplete or batchwise U-statistics can ameliorate computational cost at the expense of minimal increases in variance, while the theoretical guarantees for robustness and confidence remain largely intact (Laforgue et al., 2022).

7. Connections, Limitations, and Summary Table

Median-of-means tournaments extend classical robust estimation (median-of-means, geometric median) to high-dimensional and structured learning, outperforming standard ERM under heavy tails, and delivering minimax rates with high-confidence. Their universality, in metric geometries and for pairwise/U-statistics, broadens applicability in modern statistical settings.

Key properties by method class:

Method Class Moment Assumptions Confidence/Tail Outlier Robustness
ERM (mean, least-squares) Sub-Gaussian required Polynomial (weak) None
MoM mean/tournament 2nd moment (finite) Exponential (sharp) Up to 25% contaminated
MoM + regularization 2nd/4th moment Exponential Similar
U-statistics / pairwise E[h²] < ∞ Same (slightly larger) Same
Metric/Geometric median 2nd moment metr. Exponential Tolerates heavy tails

All rates and claims are a direct synthesis of the cited arXiv papers (Lugosi et al., 2016, Lugosi et al., 2017, Lugosi et al., 2017, Laforgue et al., 2022, Yun et al., 2022).

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Median-Of-Means Tournaments.