Min-Max Median Formulation: Robust VB
- Min-Max Median Formulation is a robust aggregation method in scalable variational Bayes that replaces mean-based KL aggregation with a median-based saddle-point objective.
- It employs a game-theoretic optimization and a two-stage aggregate-and-rescale strategy to ensure statistical optimality and resistance to up to ⌊m/2⌋ adversarial block corruptions.
- The approach achieves improved finite-sample guarantees and near-optimal concentration rates, making it practical for inference in contaminated or heterogeneous environments.
The min-max median formulation is an advanced robust aggregation principle introduced for scalable variational Bayes (VB) inference over partitioned data. Designed to handle contamination and outliers, it replaces traditional mean-based Kullback-Leibler (KL) aggregation with a robust, saddle-point objective involving the median of per-block divergences. This approach has theoretical guarantees for robustness and statistical optimality, especially when combined with a two-stage aggregate-and-rescale strategy in the presence of local latent variables (Yan et al., 14 Dec 2025).
1. Mathematical Definition and Motivation
Consider a dataset partitioned into disjoint subsets . For each subset, define the -powered local variational "posterior": where is the prior. The classical (non-robust) aggregation in distributed VB computes
To robustify against dataset contamination, the min-max median (M-VB) formulation introduces a game-theoretic saddle-point structure: or, by exploiting the relationship ,
where denotes the evidence lower bound for the th block. This shift from mean to median enables resistance to outlier blocks (insensitivity up to arbitrary corruptions).
2. Equivalence to ELBO Maximization and Aggregation Consistency
For mean-based aggregation, KL minimization is equivalent to maximization of the average ELBO: Directly replacing the mean with a median fails to ensure this equivalence, due to block-specific normalization constants. The min-max median fix enforces normalization cancellation by maximizing (in ) over the difference, thereby restoring the consistent mapping between divergence aggregation and median ELBO optimization.
3. Optimization Formulation and Robustness
The optimization takes the form
with the saddle point given by
Here, both and belong to a tractable variational family (e.g., mean-field Gaussians). The median operator ensures the resulting aggregate is robust to up to adversarial block corruptions. Outer minimization in seeks a distribution minimizing the "worst-half" ELBO relative to , while inner maximization in aligns constants by targeting the block at which is least favorable.
4. Algorithmic Implementation
The min-max median objective can be solved via a coordinate-ascent-style alternating update. At iteration :
- Median Block Selection:
- CAVI Update for : Within block ,
separately for each coordinate .
- Update for : Select analogously, then update .
For models with local latent variables , the procedure first solves the min-max median problem over the unnormalized joint density (without the -power), obtaining . The global marginal is then rescaled: ensuring , which aligns with the full-data posterior covariance.
5. Theoretical Properties and Rates
Under regularity conditions—smoothness, sub-exponential tails, bounded-parameter space—the local VB posteriors satisfy a non-asymptotic Bernstein–von Mises theorem: If these limits are Gaussian, the M-VB aggregate remains Gaussian with mean given by the solution to a min-max quadratic program across the local means : For general , the aggregate satisfies
where quantifies the empirical gap between upper and lower quantiles of the ELBO loss differences; for , . The posterior mean achieves the near-optimal rate:
6. Two-Stage Aggregate-and-Rescale Versus Direct Aggregation
Two primary aggregation strategies are analyzed:
| Approach | Limiting Behavior | Concentration Rate |
|---|---|---|
| One-stage -powered M | Limit is minimizer of , generally | Extra factor in error upper bound |
| Two-stage aggregate-and-rescale | Minimizer matches true mode | No extra factor; improved rate |
Direct application of the -powered min-max median to the full likelihood does not yield the true posterior mode unless the model lacks local latent variables. The two-stage procedure—first solving the min-max median problem on unpowered joint likelihoods and subsequently rescaling the global marginal—ensures the population-level estimator recovers the correct mode and concentration rate. This yields improved finite-sample KL bounds and mean concentration compared to direct aggregation or median-of-means VB (Yan et al., 14 Dec 2025).
7. Robustness, Practical Impact, and Statistical Significance
The min-max median formulation robustifies VB aggregation under partitioned data, offering insensitivity to adversarial contamination and finite-sample guarantees. The saddle-point median-based aggregation, together with the aggregate-and-rescale technique in the presence of local latent variables, supports both theoretical optimality and computational scalability. The demonstrated improved rates and robustness positions the min-max median approach as a substantial advancement in distributed inference within contaminated or heterogeneous environments (Yan et al., 14 Dec 2025).