Min-Max Median Formulation: Robust VB

Updated 21 December 2025

Min-Max Median Formulation is a robust aggregation method in scalable variational Bayes that replaces mean-based KL aggregation with a median-based saddle-point objective.
It employs a game-theoretic optimization and a two-stage aggregate-and-rescale strategy to ensure statistical optimality and resistance to up to ⌊m/2⌋ adversarial block corruptions.
The approach achieves improved finite-sample guarantees and near-optimal concentration rates, making it practical for inference in contaminated or heterogeneous environments.

The min-max median formulation is an advanced robust aggregation principle introduced for scalable variational Bayes (VB) inference over partitioned data. Designed to handle contamination and outliers, it replaces traditional mean-based Kullback-Leibler (KL) aggregation with a robust, saddle-point objective involving the median of per-block divergences. This approach has theoretical guarantees for robustness and statistical optimality, especially when combined with a two-stage aggregate-and-rescale strategy in the presence of local latent variables (Yan et al., 14 Dec 2025).

1. Mathematical Definition and Motivation

Consider a dataset $\mathcal{D}$ partitioned into $m$ disjoint subsets $\mathcal{D}_1, ..., \mathcal{D}_m$ . For each subset, define the $m$ -powered local variational "posterior": $\tilde{P}^m_j(\theta) \propto \pi(\theta) [p(\mathcal{D}_j \mid \theta)]^m,$ where $\pi$ is the prior. The classical (non-robust) aggregation in distributed VB computes

$\hat{Q} = \arg\min_{Q\in\mathcal{F}} \frac{1}{m} \sum_{j=1}^m KL(Q \| \tilde{P}^m_j).$

To robustify against dataset contamination, the min-max median (M-VB) formulation introduces a game-theoretic saddle-point structure: $\hat{F} = \arg\min_{F\in\mathcal{F}} \max_{G\in\mathcal{F}} \mathrm{Med}_{1\leq j \leq m}\left\{ KL(F \| \tilde{P}^m_j) - KL(G \| \tilde{P}^m_j) \right\},$ or, by exploiting the relationship $KL(Q \| \tilde{P}^m_j) = \text{const}_j - ELBO_j(Q)$ ,

$\hat{F} = \arg\min_{F\in\mathcal{F}} \max_{G\in\mathcal{F}} \mathrm{Med}_{1\leq j \leq m}\left\{ ELBO_j(G) - ELBO_j(F) \right\},$

where $m$ 0 denotes the evidence lower bound for the $m$ 1th block. This shift from mean to median enables resistance to outlier blocks (insensitivity up to $m$ 2 arbitrary corruptions).

2. Equivalence to ELBO Maximization and Aggregation Consistency

For mean-based aggregation, KL minimization is equivalent to maximization of the average ELBO: $m$ 3 Directly replacing the mean with a median fails to ensure this equivalence, due to block-specific normalization constants. The min-max median fix enforces normalization cancellation by maximizing (in $m$ 4) over the difference, thereby restoring the consistent mapping between divergence aggregation and median ELBO optimization.

3. Optimization Formulation and Robustness

The optimization takes the form

$m$ 5

with the saddle point given by

$m$ 6

Here, both $m$ 7 and $m$ 8 belong to a tractable variational family $m$ 9 (e.g., mean-field Gaussians). The median operator ensures the resulting aggregate is robust to up to $\mathcal{D}_1, ..., \mathcal{D}_m$ 0 adversarial block corruptions. Outer minimization in $\mathcal{D}_1, ..., \mathcal{D}_m$ 1 seeks a distribution minimizing the "worst-half" ELBO relative to $\mathcal{D}_1, ..., \mathcal{D}_m$ 2, while inner maximization in $\mathcal{D}_1, ..., \mathcal{D}_m$ 3 aligns constants by targeting the block at which $\mathcal{D}_1, ..., \mathcal{D}_m$ 4 is least favorable.

4. Algorithmic Implementation

The min-max median objective can be solved via a coordinate-ascent-style alternating update. At iteration $\mathcal{D}_1, ..., \mathcal{D}_m$ 5:

Median Block Selection:

$\mathcal{D}_1, ..., \mathcal{D}_m$ 6

CAVI Update for $\mathcal{D}_1, ..., \mathcal{D}_m$ 7: Within block $\mathcal{D}_1, ..., \mathcal{D}_m$ 8,

$\mathcal{D}_1, ..., \mathcal{D}_m$ 9

separately for each coordinate $m$ 0.

Update for $m$ 1: Select $m$ 2 analogously, then update $m$ 3.

For models with local latent variables $m$ 4, the procedure first solves the min-max median problem over the unnormalized joint density $m$ 5 (without the $m$ 6-power), obtaining $m$ 7. The global marginal $m$ 8 is then rescaled: $m$ 9 ensuring $\tilde{P}^m_j(\theta) \propto \pi(\theta) [p(\mathcal{D}_j \mid \theta)]^m,$ 0, which aligns with the full-data posterior covariance.

5. Theoretical Properties and Rates

Under regularity conditions—smoothness, sub-exponential tails, bounded-parameter space—the local VB posteriors satisfy a non-asymptotic Bernstein–von Mises theorem: $\tilde{P}^m_j(\theta) \propto \pi(\theta) [p(\mathcal{D}_j \mid \theta)]^m,$ 1 If these limits are Gaussian, the M-VB aggregate $\tilde{P}^m_j(\theta) \propto \pi(\theta) [p(\mathcal{D}_j \mid \theta)]^m,$ 2 remains Gaussian with mean given by the solution to a min-max quadratic program across the local means $\tilde{P}^m_j(\theta) \propto \pi(\theta) [p(\mathcal{D}_j \mid \theta)]^m,$ 3: $\tilde{P}^m_j(\theta) \propto \pi(\theta) [p(\mathcal{D}_j \mid \theta)]^m,$ 4 For general $\tilde{P}^m_j(\theta) \propto \pi(\theta) [p(\mathcal{D}_j \mid \theta)]^m,$ 5, the aggregate satisfies

$\tilde{P}^m_j(\theta) \propto \pi(\theta) [p(\mathcal{D}_j \mid \theta)]^m,$ 6

where $\tilde{P}^m_j(\theta) \propto \pi(\theta) [p(\mathcal{D}_j \mid \theta)]^m,$ 7 quantifies the empirical gap between upper and lower quantiles of the ELBO loss differences; for $\tilde{P}^m_j(\theta) \propto \pi(\theta) [p(\mathcal{D}_j \mid \theta)]^m,$ 8, $\tilde{P}^m_j(\theta) \propto \pi(\theta) [p(\mathcal{D}_j \mid \theta)]^m,$ 9. The posterior mean achieves the near-optimal rate: $\pi$ 0

6. Two-Stage Aggregate-and-Rescale Versus Direct Aggregation

Two primary aggregation strategies are analyzed:

Approach	Limiting Behavior	Concentration Rate
One-stage $\pi$ 1-powered M	Limit is minimizer of $\pi$ 2, generally $\pi$ 3	Extra factor $\pi$ 4 in error upper bound
Two-stage aggregate-and-rescale	Minimizer matches true mode $\pi$ 5	No extra $\pi$ 6 factor; improved rate

Direct application of the $\pi$ 7-powered min-max median to the full likelihood does not yield the true posterior mode unless the model lacks local latent variables. The two-stage procedure—first solving the min-max median problem on unpowered joint likelihoods and subsequently rescaling the global marginal—ensures the population-level estimator recovers the correct mode and concentration rate. This yields improved finite-sample KL bounds and mean concentration compared to direct aggregation or median-of-means VB (Yan et al., 14 Dec 2025).

7. Robustness, Practical Impact, and Statistical Significance

The min-max median formulation robustifies VB aggregation under partitioned data, offering insensitivity to adversarial contamination and finite-sample guarantees. The saddle-point median-based aggregation, together with the aggregate-and-rescale technique in the presence of local latent variables, supports both theoretical optimality and computational scalability. The demonstrated improved rates and robustness positions the min-max median approach as a substantial advancement in distributed inference within contaminated or heterogeneous environments (Yan et al., 14 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Robust Variational Bayes by Min-Max Median Aggregation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Min-Max Median Formulation.