Papers
Topics
Authors
Recent
Search
2000 character limit reached

Min-Max Median Formulation: Robust VB

Updated 21 December 2025
  • Min-Max Median Formulation is a robust aggregation method in scalable variational Bayes that replaces mean-based KL aggregation with a median-based saddle-point objective.
  • It employs a game-theoretic optimization and a two-stage aggregate-and-rescale strategy to ensure statistical optimality and resistance to up to ⌊m/2⌋ adversarial block corruptions.
  • The approach achieves improved finite-sample guarantees and near-optimal concentration rates, making it practical for inference in contaminated or heterogeneous environments.

The min-max median formulation is an advanced robust aggregation principle introduced for scalable variational Bayes (VB) inference over partitioned data. Designed to handle contamination and outliers, it replaces traditional mean-based Kullback-Leibler (KL) aggregation with a robust, saddle-point objective involving the median of per-block divergences. This approach has theoretical guarantees for robustness and statistical optimality, especially when combined with a two-stage aggregate-and-rescale strategy in the presence of local latent variables (Yan et al., 14 Dec 2025).

1. Mathematical Definition and Motivation

Consider a dataset D\mathcal{D} partitioned into mm disjoint subsets D1,...,Dm\mathcal{D}_1, ..., \mathcal{D}_m. For each subset, define the mm-powered local variational "posterior": P~jm(θ)π(θ)[p(Djθ)]m,\tilde{P}^m_j(\theta) \propto \pi(\theta) [p(\mathcal{D}_j \mid \theta)]^m, where π\pi is the prior. The classical (non-robust) aggregation in distributed VB computes

Q^=argminQF1mj=1mKL(QP~jm).\hat{Q} = \arg\min_{Q\in\mathcal{F}} \frac{1}{m} \sum_{j=1}^m KL(Q \| \tilde{P}^m_j).

To robustify against dataset contamination, the min-max median (M-VB) formulation introduces a game-theoretic saddle-point structure: F^=argminFFmaxGFMed1jm{KL(FP~jm)KL(GP~jm)},\hat{F} = \arg\min_{F\in\mathcal{F}} \max_{G\in\mathcal{F}} \mathrm{Med}_{1\leq j \leq m}\left\{ KL(F \| \tilde{P}^m_j) - KL(G \| \tilde{P}^m_j) \right\}, or, by exploiting the relationship KL(QP~jm)=constjELBOj(Q)KL(Q \| \tilde{P}^m_j) = \text{const}_j - ELBO_j(Q),

F^=argminFFmaxGFMed1jm{ELBOj(G)ELBOj(F)},\hat{F} = \arg\min_{F\in\mathcal{F}} \max_{G\in\mathcal{F}} \mathrm{Med}_{1\leq j \leq m}\left\{ ELBO_j(G) - ELBO_j(F) \right\},

where ELBOj(Q)ELBO_j(Q) denotes the evidence lower bound for the jjth block. This shift from mean to median enables resistance to outlier blocks (insensitivity up to m/2\lfloor m/2 \rfloor arbitrary corruptions).

2. Equivalence to ELBO Maximization and Aggregation Consistency

For mean-based aggregation, KL minimization is equivalent to maximization of the average ELBO: argminQ1mjKL(QP~jm)=argmaxQ1mjELBOj(Q).\arg\min_Q \frac{1}{m}\sum_j KL(Q \| \tilde{P}^m_j) = \arg\max_Q \frac{1}{m}\sum_j ELBO_j(Q). Directly replacing the mean with a median fails to ensure this equivalence, due to block-specific normalization constants. The min-max median fix enforces normalization cancellation by maximizing (in GG) over the difference, thereby restoring the consistent mapping between divergence aggregation and median ELBO optimization.

3. Optimization Formulation and Robustness

The optimization takes the form

Φ(F,G)=Med1jm{ELBOj(G)ELBOj(F)},\Phi(F,G) = \mathrm{Med}_{1\leq j \leq m} \left\{ ELBO_j(G) - ELBO_j(F) \right\},

with the saddle point given by

F^=argminFmaxGΦ(F,G).\hat{F} = \arg\min_F\max_G \Phi(F,G).

Here, both FF and GG belong to a tractable variational family F\mathcal{F} (e.g., mean-field Gaussians). The median operator ensures the resulting aggregate is robust to up to m/2\lfloor m/2 \rfloor adversarial block corruptions. Outer minimization in FF seeks a distribution minimizing the "worst-half" ELBO relative to GG, while inner maximization in GG aligns constants by targeting the block at which FF is least favorable.

4. Algorithmic Implementation

The min-max median objective can be solved via a coordinate-ascent-style alternating update. At iteration tt:

  • Median Block Selection:

jtargmax1jm{ELBOj(G(t))ELBOj(F(t))}j_t \leftarrow \arg\max_{1 \leq j \leq m}\left\{ ELBO_j(G^{(t)}) - ELBO_j(F^{(t)}) \right\}

  • CAVI Update for FF: Within block jtj_t,

fl(t+1)(θl)exp{EFl(t)[logπ(θ)p(Djtθ)m]},f^{(t+1)}_l(\theta_l) \propto \exp\left\{ \mathbb{E}_{F^{(t)}_{-l}}[\log \pi(\theta) p(\mathcal{D}_{j_t} \mid \theta)^m] \right\},

separately for each coordinate θl\theta_l.

  • Update for GG: Select jtj'_t analogously, then update g(t+1)g^{(t+1)}.

For models with local latent variables SS, the procedure first solves the min-max median problem over the unnormalized joint density p(Dj,Sjθ)p(\mathcal{D}_j, S_j \mid \theta) (without the mm-power), obtaining F~\tilde{F}. The global marginal f~(θ)\tilde{f}(\theta) is then rescaled: f^(θ)=mp/2f~(μF~+m(θμF~)),\hat{f}(\theta) = m^{p/2} \tilde{f} \bigl(\mu_{\tilde{F}} + \sqrt{m}(\theta - \mu_{\tilde{F}})\bigr), ensuring Cov(F^)=Cov(F~)/m\mathrm{Cov}(\hat{F}) = \mathrm{Cov}(\tilde{F})/m, which aligns with the full-data posterior covariance.

5. Theoretical Properties and Rates

Under regularity conditions—smoothness, sub-exponential tails, bounded-parameter space—the local VB posteriors satisfy a non-asymptotic Bernstein–von Mises theorem: KL(Q^jN(θ^j,1mnH1))=Op(m(logn)3/2n).KL(\hat{Q}_j \| N(\hat{\theta}_j, \tfrac{1}{mn}H^{-1})) = O_p\left( \frac{m(\log n)^{3/2}}{\sqrt{n}} \right). If these limits are Gaussian, the M-VB aggregate Fˉ\bar{F} remains Gaussian with mean given by the solution to a min-max quadratic program across the local means θ^j\hat{\theta}_j: μFˉ=argminθfmaxθgmed1jm{(θfθ^j)Ω(θfθ^j)(θgθ^j)Ω(θgθ^j)}.\mu_{\bar{F}} = \arg\min_{\theta_f}\max_{\theta_g} \mathrm{med}_{1\leq j\leq m} \left\{ (\theta_f - \hat{\theta}_j)^\top\Omega(\theta_f - \hat{\theta}_j) - (\theta_g - \hat{\theta}_j)^\top\Omega(\theta_g - \hat{\theta}_j) \right\}. For general F\mathcal{F}, the aggregate satisfies

KL(F^N(μF^,1mnH1))=Op(m(logn)3/2n+nΔτ),KL(\hat{F} \| N(\mu_{\hat{F}}, \tfrac{1}{mn}H^{-1})) = O_p\left( \frac{m(\log n)^{3/2}}{\sqrt{n}} + n\Delta_{\tau} \right),

where Δτ\Delta_{\tau} quantifies the empirical gap between upper and lower quantiles of the ELBO loss differences; for αn=O(1/m)\alpha_n = O(1/\sqrt{m}), Δτ=O(1/(mn))\Delta_{\tau} = O(1/(mn)). The posterior mean achieves the near-optimal rate: μF^θ2=O(αnn+lognmn+(logn)3/4n3/4).\|\mu_{\hat{F}} - \theta^*\|_2 = O \left( \frac{\alpha_n}{\sqrt{n}} + \sqrt{\frac{\log n}{mn}} + \frac{(\log n)^{3/4}}{n^{3/4}} \right).

6. Two-Stage Aggregate-and-Rescale Versus Direct Aggregation

Two primary aggregation strategies are analyzed:

Approach Limiting Behavior Concentration Rate
One-stage mm-powered M Limit is minimizer of logp(D,Sθ)mdS\log \int p(\mathcal{D}, S \mid \theta)^m dS, generally θ\neq \theta^* Extra factor mm in error upper bound
Two-stage aggregate-and-rescale Minimizer matches true mode θ\theta^* No extra mm factor; improved rate

Direct application of the mm-powered min-max median to the full likelihood does not yield the true posterior mode unless the model lacks local latent variables. The two-stage procedure—first solving the min-max median problem on unpowered joint likelihoods and subsequently rescaling the global marginal—ensures the population-level estimator recovers the correct mode and concentration rate. This yields improved finite-sample KL bounds and mean concentration compared to direct aggregation or median-of-means VB (Yan et al., 14 Dec 2025).

7. Robustness, Practical Impact, and Statistical Significance

The min-max median formulation robustifies VB aggregation under partitioned data, offering insensitivity to adversarial contamination and finite-sample guarantees. The saddle-point median-based aggregation, together with the aggregate-and-rescale technique in the presence of local latent variables, supports both theoretical optimality and computational scalability. The demonstrated improved rates and robustness positions the min-max median approach as a substantial advancement in distributed inference within contaminated or heterogeneous environments (Yan et al., 14 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Min-Max Median Formulation.