MedBN: Robust Median Batch Normalization

Updated 7 March 2026

MedBN is a robust normalization technique that uses the median instead of the mean to compute per-batch statistics, ensuring bounded deviations under adversarial conditions.
It integrates seamlessly with existing test-time adaptation pipelines, reducing attack success rates by mitigating the influence of malicious samples on batch statistics.
Empirical results on benchmarks like CIFAR and ImageNet demonstrate that MedBN significantly improves robustness compared to traditional Batch Normalization, even under severe corruption.

Median Batch Normalization (MedBN) is a normalization technique designed to robustify test-time adaptation (TTA) in neural networks against malicious test samples, specifically addressing vulnerabilities of standard Batch Normalization (BN) to adversarial manipulation in the estimation of per-batch statistics. By replacing the mean with the median in the computation of normalization parameters, MedBN provides provably bounded robustness to substantial fractions of corrupted data within test batches, enabling reliable adaptation even in adversarial settings. The method is algorithm-agnostic and integrates as a drop-in substitute for conventional BN computations in any TTA pipeline (Park et al., 2024).

1. Motivation and Problem Context

Test-time adaptation methods exploit batch-level statistics—typically the arithmetic mean and variance computed from incoming test samples at inference—to mitigate performance degradation caused by distribution shifts between training and testing. However, reliance on these statistics introduces susceptibility to adversarial contamination. A modest fraction of "poisoned" inputs (10–20%) can be sufficient to distort the computed mean and variance, drastically altering feature normalizations throughout the network and thereby facilitating high attack success rates (ASR), often surpassing 70–80% on standard benchmarks under the Distribution Invading Attack paradigm (Park et al., 2024). The core cause is the unbounded influence of the mean: a single extreme value can drive the mean arbitrarily far from its benign-centered value.

The statistical median, in contrast, offers a 50% breakdown point: unless the adversary controls at least half of the batch, the median remains within the convex hull of the unpoisoned (benign) samples. Theorem 1 (Park et al., 2024) formalizes this property, showing that with fewer than half the batch poisoned, the deviation of the batch median is always bounded by the spread of the benign data, while the mean can be shifted without bound.

2. Formal Definition and Mathematical Structure

Let ${x_{i,c}}_{i=1}^B$ denote the activations along channel $c$ in a BN layer for a test batch of size $B$ . MedBN replaces the computation of per-channel mean $\mu_c$ and variance $\sigma_c^2$ with the following:

Batch Median: For each channel $c$ ,

$\eta_c := \mathrm{med}_{i=1...B} x_{i,c}$

where median is the univariate 50th percentile.

Scale Parameter: Two alternatives are considered, but MedBN adopts the mean squared deviation from the median for balancing robustness and accuracy,

$\rho_c^2 := \frac{1}{B} \sum_{i=1}^B (x_{i,c} - \eta_c)^2$

The Median Absolute Deviation (MAD),

$\mathrm{MAD}_c := \mathrm{med}_{i=1...B} |x_{i,c} - \eta_c|$

is optionally considered, but in practice the mean squared deviation is preferred for its stability.

Normalization Transform: For each sample $i$ and channel $c$ ,

$y_{i,c} = \gamma_c \cdot \frac{x_{i,c} - \eta_c}{\sqrt{\rho_c^2 + \epsilon}} + \beta_c$

with affine parameters $\gamma_c$ , $\beta_c$ inherited from the pretrained model, and $\epsilon$ a small positive constant (e.g., $10^{-5}$ ) for numerical stability.

3. Algorithmic Integration and Implementation

MedBN substitutes the test-time mean and variance calculation in each BN layer with its median-centered counterparts, leaving the TTA workflow otherwise unchanged. The adaptation loop proceeds as follows:

For each batch $B$ $B$ of test samples:
- Perform a forward pass up to each BN layer.
- Compute the per-channel batch median $\eta_c$ and scale $\rho_c^2$ using all activations in the batch.
- Normalize activations using these robust statistics.
- Continue with TTA-specific adaptation objectives (e.g., entropy minimization).
- Update affine parameters $\gamma_c$ , $\beta_c$ , and any other relevant adaptively learned variables.

Per-layer pseudocode:

def MedBN_Normalize(x_in, gamma, beta, eps=1e-5):
    # x_in: shape [B, C, H, W]
    eta = x_in.flatten(0,2).median(dim=0).values  # median per channel
    delta = x_in - eta.view(1, -1, 1, 1)
    rho_sq = (delta**2).mean(dim=(0,2,3), keepdim=True)  # mean squared deviation
    x_out = gamma * (delta / (rho_sq + eps).sqrt()) + beta
    return x_out

No moving median or momentum is required: statistics are recomputed for each batch. Empirically, the computational overhead for median calculation is negligible for typical batch sizes (

B=32

–$200$) on modern accelerators.

4. Theoretical Guarantees and Robustness

The median, as a summary statistic, is maximally robust against contamination: up to $\lfloor B/2 \rfloor$ adversarial points in a batch of size $B$ cannot move the batch median outside the range of the benign points. Theorem 1 rigorously establishes that, for $m < B/2$ poisoned points, $|\mathrm{med}(\mathrm{benign} \cup \mathrm{malicious}) - \mathrm{med}(\mathrm{benign})|$ is always bounded by $\max(\mathrm{benign}) - \min(\mathrm{benign})$ , while the mean can suffer unbounded deviation for even a single adversarial value (e.g., by sending a value to $\pm\infty$ ) (Park et al., 2024).

For multivariate activations, coordinate-wise medians or geometric medians can be used. The implementation in (Park et al., 2024) adopts coordinate-wise medians for efficiency.

The computational cost is dominated by the selection/sorting step in median computation, $O(B\log B)$ per channel, but fast selection algorithms offer $O(B)$ average complexity, typically negligible for batch sizes used in practice.

5. Empirical Performance and Sensitivity Analyses

MedBN was evaluated on CIFAR-10-C, CIFAR-100-C, and ImageNet-C, each featuring 15 corruption types at five severity levels, with principal emphasis on the most challenging (severity level 5). Attack scenarios include:

Instant Attack: The adversary injects $m$ malicious samples into the current batch only.
Cumulative Attack: Adversarial samples are injected into every batch.

Two attack objectives are considered:

Targeted (ASR): Coerce a given image to be classified as a specific target label.
Indiscriminate (ER): Maximize error rate on benign inputs.

A representative summary of results (20% poisoned, $B=200$ ):

Dataset	Baseline (BN-TENT) ASR	MedBN-TENT ASR	Baseline ER	MedBN ER
CIFAR-10-C	72%	18%	28%	20%
CIFAR-100-C	79%	4%	—	—
ImageNet-C	91%	0.4%	—	—

Results indicate that MedBN halves or better the ASR/ER relative to standard BN-based methods, often reducing ASR to nearly zero even against coordinated attacks. Robustness persists up to $m \approx B/2$ corrupted samples, with graceful degradation beyond this threshold. MedBN remains effective even with small batch sizes, although the statistical stability of the median diminishes for $B<4$ .

MedBN is contrasted with algorithms such as mDIA (source/test stat interpolation), SoTTA (low-confidence sample filtering), and sEMA (momentum statistic smoothing). These alternative defenses achieve partial reduction of ASR (typically to the 20–40% range for $m=40$ out of $B=200$ ), but MedBN consistently outperforms, and can be composed with such techniques (e.g., SoTTA+MedBN achieves the lowest recorded ASRs). MedBN is algorithm-agnostic and integrates with any BN-based TTA workflow.

Limitations include the loss of robustness when adversarial contamination exceeds 50% of the batch, increased computational overhead with very small devices or batch sizes, and possible reduced statistical reliability when $B<4$ . The cost of median computation may become significant for extremely large batch sizes unless optimized selection routines are employed.

7. Practical Usage and Extensions

MedBN is readily implemented in deep learning frameworks by subclassing existing BN layers. In PyTorch, $\texttt{nn.BatchNorm2d}$ can be extended to use the per-batch median and mean squared deviation instead of the mean and variance in evaluation mode. Analogous modifications apply in TensorFlow by overriding $\texttt{tf.keras.layers.BatchNormalization}$ .

Recommended settings mirror standard BN: $\epsilon = 10^{-5}$ or $10^{-3}$ for stability, and no extra moving-window or exponential averaging is required. For large batch sizes, fast selection algorithms such as $\texttt{nth\_element}$ are advised, although empirical benchmarks confirm that GPU-based sorting is sufficiently performant for $B \leq 256$ .

MedBN extends to full adversarial variance robustness if the scale parameter $\rho_c^2$ is replaced by $\text{MAD}_c^2$ , though this may marginally increase clean-data error. Possible future expansions include employing geometric medians for vector activation robustness, integrating MedBN with sample filtering or other TTA defenses, and adapting the median-based approach to alternative normalization strategies such as LayerNorm or GroupNorm, especially in non-i.i.d. and federated learning settings (Park et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

MedBN: Robust Test-Time Adaptation against Malicious Test Samples (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Median Batch Normalization (MedBN).