Median-Of-Means Tournaments

Updated 12 December 2025

Median-of-means tournaments are robust statistical procedures that partition data into blocks and use pairwise tournaments to select optimal estimators.
They extend classical methods to heavy-tailed, high-dimensional, and non-Euclidean settings by decoupling estimation and concentration.
These methods achieve exponential tail bounds and attain minimax rates, offering high-confidence performance even with significant outlier contamination.

Median-of-means tournaments are a class of robust statistical learning procedures that achieve optimal accuracy–confidence tradeoffs under minimal moment assumptions. They operate by partitioning data into blocks, constructing blockwise estimators (means or empirical losses), and resolving pairwise comparisons or "tournaments" via majority or median rules. This framework decouples estimation and concentration, attaining exponential deviation bounds even in heavy-tailed or non-Euclidean settings, and extends naturally to regularized estimation and various loss function structures, including U-statistics and metric spaces beyond Euclidean geometry (Lugosi et al., 2016, Lugosi et al., 2017, Yun et al., 2022, Lugosi et al., 2017, Laforgue et al., 2022).

1. Core Principles and Classical Framework

The classical median-of-means tournament is defined as follows: Given $n$ i.i.d. observations $X_1, \ldots, X_n$ in $\mathbb{R}^D$ , the data are split into $k$ disjoint blocks $\mathcal{B}_1, \ldots, \mathcal{B}_k$ , each of size $m = \lfloor n / k \rfloor$ . The block means are computed: $Z_j = \frac{1}{m} \sum_{X_i \in \mathcal{B}_j} X_i, \quad j=1, \ldots, k.$ For any candidate estimators $a, b \in \mathbb{R}^D$ , $a$ is said to defeat $b$ if it is closer to $Z_j$ than $b$ in a majority of blocks; i.e., $\#\{j: \|a - Z_j\| \leq \|b - Z_j\|\} > k/2$ . The "defeating region" $S_x$ of $x$ consists of those $a$ that defeat $x$ ; its smallest containing radius $r_x$ over $x$ is the "defeating radius." The tournament estimator is

$\hat{x}_{MM} \in \arg\min_{x \in \mathbb{R}^D} r_x,$

which, in effect, implements a geometric median of the block means via pairwise tournaments (Yun et al., 2022).

This approach generalizes to function estimation and risk minimization in $L_2$ spaces and convex or hierarchical function classes, underpinning a suite of procedures for statistical learning with only second-moment assumptions (Lugosi et al., 2016, Lugosi et al., 2017).

2. Median-of-Means Tournaments in Risk Minimization

In regression and machine learning, median-of-means tournaments provide a mechanism for selecting a predictor $\hat{f}$ from a class $\mathcal{F} \subset L_2(\mu)$ so that the excess risk $R(\hat{f}) - R(f^*)$ is minimized with high confidence. Here, $R(f) = \mathbb{E}[(f(X) - Y)^2]$ and $f^*$ denotes the minimizer.

The procedure typically involves three phases:

Distance Oracle: On a subsample, estimate the $L_1$ or $L_2$ distance between candidate functions robustly via median-of-means, ensuring only sufficiently well-separated pairs are compared.
Preliminary Round (Elimination): On disjoint data, conduct blockwise tournaments comparing empirical risks. "Matches" are decided by majority, and only predictors unbeaten in all allowed duels advance.
Champions League (Final Selection): Another independent split is used; blockwise comparisons further restrict admissible predictors, producing the final estimator (Lugosi et al., 2016, Laforgue et al., 2022).

This design ensures that, with high probability (exponentially small failure), the returned $\hat{f}$ is nearly optimal. The approach is robust to heavy tails and outliers: median blockwise aggregation ensures that a constant fraction of contaminated blocks does not affect the final outcome (Lugosi et al., 2016, Lugosi et al., 2017, Lugosi et al., 2017).

3. Extensions: Regularization and High-Dimensional Problems

Median-of-means tournaments extend naturally to incorporate regularization and structural penalties:

Tournament LASSO and SLOPE: For high-dimensional settings, the procedure is applied hierarchically over classes defined by $\ell_1$ - or sorted $\ell_1$ -balls (SLOPE). At each level, a regularization term is built into the blockwise match comparisons, with the penalty parameter chosen to balance blockwise variance and regularization bias. The procedure selects the most complex class in which the target function survives all rounds.
Guarantees: Under only fourth-moment (or sometimes only second-moment) assumptions, the regularized tournament estimators match the minimax rates known from sub-Gaussian theory but with high-confidence exponential deviation:

$P\left( \|\hat{t} - t_0\|_2 > C \sigma \sqrt{(s \log (ed/s))/N} \right) \leq 2 \exp( -c N \min\{1, (s \log (ed/s))/N\} )$

where $t_0$ is the true parameter, and $s$ its (approximate) sparsity (Lugosi et al., 2017).

A four-phase adaptation for regularization incorporates a "distance oracle," "elimination," "champions league," and a final selection step across subset hierarchies (Lugosi et al., 2017).

4. Generalizations: Metrics, U-Statistics, and Randomized Blocks

The median-of-means tournament framework generalizes beyond Euclidean spaces and single-sample losses:

General Metric Spaces and Non-Euclidean Geometry: The tournament estimator is defined on general Polish metric spaces $(\mathcal{M}, d)$ . Instead of means, empirical Fréchet means and metric-based losses $\eta(x, y) = d(x, y)^2$ are used. Exponential deviation inequalities are established under mild "quadruple" and "variance" inequalities linked to the space’s curvature. The framework applies notably to non-positive curvature (NPC) spaces (Yun et al., 2022).
Pairwise and U-Statistic Losses: For ranking, metric learning, or clustering, risk is given by $R(f) = \mathbb{E}[\ell(f; X_i, X_j)]$ . The empirical estimate is a $U$ -statistic; tournaments and blockwise medians of $U$ -statistics, including randomized blocks (sampling without replacement), retain concentration and robustness properties. Key deviation bounds are provided for both median-of-means variants and their extensions to U-statistics (Laforgue et al., 2022).
Randomization in Block Formation: Classical MoM requires fixed-size, partitioned blocks. Randomized blocks formed via SRSWoR (simple random sampling without replacement) decouple block count and block size, preserving concentration inequalities even when blocks are created via reshuffling, enabling practical parallelization and stochastic optimization (Laforgue et al., 2022).

5. Statistical Risk, Robustness, and Accuracy–Confidence Tradeoffs

A definitive property of median-of-means tournaments is their attainment of optimal accuracy–confidence tradeoffs under minimal tail assumptions:

Exponential Tail Bounds: For heavy-tailed data, MoM tournament estimators satisfy

$P \left( \text{Estimation error} > r \right ) \leq \exp\left( -c N \min\{1, (r/\sigma)^2 \} \right)$

significantly outperforming Chebyshev-type polynomial bounds provided by empirical mean or empirical risk minimization (ERM) (Lugosi et al., 2016, Yun et al., 2022).

Sub-Gaussian Concentration: Even under only second-moment (finite variance) conditions, the deviation rates can scale as $O(\sqrt{\log(1/\delta)/n})$ with probability $1 - \delta$ , matching sub-Gaussian estimators under high confidence (Laforgue et al., 2022, Yun et al., 2022).
Robustness to Outliers: As the decision rules are based on medians over blocks, a constant fraction of corrupted or arbitrarily contaminated blocks (e.g., up to $25\%$ ) does not affect the selection—guaranteeing tolerance to adversarial contamination (Lugosi et al., 2017, Lugosi et al., 2016).

6. Computational and Practical Considerations

The main practical limitation is algorithmic: Exact median-of-means tournaments involve $|F|^2$ pairwise matches per block, which is infeasible for large or infinite function classes. Although the "max–median" or "minimax" reduction yields a convex-concave saddlepoint problem for convex $F$ , the existence of efficient algorithms guaranteeing the same statistical risk–confidence optimality remains an open problem (Lugosi et al., 2017).

For large-scale applications, randomized or stochastic block selection and use of incomplete or batchwise U-statistics can ameliorate computational cost at the expense of minimal increases in variance, while the theoretical guarantees for robustness and confidence remain largely intact (Laforgue et al., 2022).

7. Connections, Limitations, and Summary Table

Median-of-means tournaments extend classical robust estimation (median-of-means, geometric median) to high-dimensional and structured learning, outperforming standard ERM under heavy tails, and delivering minimax rates with high-confidence. Their universality, in metric geometries and for pairwise/U-statistics, broadens applicability in modern statistical settings.

Key properties by method class:

Method Class	Moment Assumptions	Confidence/Tail	Outlier Robustness
ERM (mean, least-squares)	Sub-Gaussian required	Polynomial (weak)	None
MoM mean/tournament	2nd moment (finite)	Exponential (sharp)	Up to 25% contaminated
MoM + regularization	2nd/4th moment	Exponential	Similar
U-statistics / pairwise	E[h²] < ∞	Same (slightly larger)	Same
Metric/Geometric median	2nd moment metr.	Exponential	Tolerates heavy tails