Extended MixStyle: Enhancing Cross-Domain Robustness
- Extended MixStyle (EM) is a feature-space augmentation framework that mixes mean, variance, skewness, and kurtosis to mimic non-Gaussian distribution shifts in sMRI data.
- It integrates a stochastic layer in a 3D U-Net, using Beta-distributed mixing coefficients to perturb feature statistics during training.
- Empirical results show that EM₁ improves macro-F1 scores across datasets, highlighting its effectiveness in handling cross-domain variability in Alzheimer's classification.
Extended MixStyle (EM) is a feature-space augmentation framework designed to improve cross-domain generalization by mixing not only the mean and variance of feature maps, as in the original MixStyle technique, but also higher-order moments—specifically skewness (third) and kurtosis (fourth). EM was developed to address the non-Gaussian, asymmetric, and heavy-tailed domain shifts encountered in structural magnetic resonance imaging (sMRI) for Alzheimer’s disease (AD) classification, boosting model robustness under single-domain training paradigms when faced with unseen scanner protocols, demographics, and acquisition differences (Batool et al., 4 Jan 2026).
1. Mathematical Foundations
EM extends MixStyle’s approach of per-sample, per-channel statistical mixing. Given a feature map mini-batch (batch size , channels , spatial positions ), the original method computes:
- Mean:
- Std:
These statistics are mixed across samples in the batch via a random permutation and Beta-distributed mixing coefficient ():
and mixed feature normalization follows as:
EM introduces two higher-order moments:
- Skewness:
- Kurtosis:
A matching permutation and mixing operation perturbs these:
Two variants are defined:
- EM₁: adds skewness perturbation,
- EM₂: adds kurtosis,
where control the perturbation strength.
2. Algorithmic Integration
EM is implemented as a stochastic layer during model training:
- Forward the input through the network encoder up to the specified layer (empirically, the second block in a four-block 3D U-Net).
- With probability , apply the EM module: a. Compute , , (and , for EM₁/EM₂). b. Permute and mix statistics across the batch using . c. Normalize and reparameterize features with EM₁ or EM₂ as per the formulations above.
- Continue propagation to the classification head.
- Compute a weighted cross-entropy loss (with class weights inverse to (NC, MCI, AD) frequencies), then backpropagate, detaching gradients through the mixed statistics.
Application of the EM module is empirically most effective at the second encoder block. Hyperparameters include mixing strength ( for EM₁/EM₂), probability (), , , batch size 16, SGD (learning rate 0.01, momentum 0.9, decay 5%/epoch).
3. Experimental Design and Datasets
Experiments were conducted using:
- Training: NACC (n=4,647; NC=2,524, MCI=1,175, AD=948)
- Test Cohorts (unseen domain shift):
- ADNI (n=1,821; NC=481, MCI=971, AD=369)
- AIBL (n=661; NC=480, MCI=102, AD=79)
- OASIS (n=644; NC=424, MCI=27, AD=193)
Domain shifts included scanner manufacturers, acquisition protocols, and population demographics. The network uses a 3D U-Net encoder (four convolutional-ReLU-BN blocks), with EM at block 2, removes the decoder, applies global pooling, then two fully connected layers and a softmax for classification. Weights are initialized from CT pretraining.
4. Comparative Performance
Table: Macro-F1 Results (NC/MCI/AD mean, external cohorts)
| Method | ADNI | AIBL | OASIS |
|---|---|---|---|
| Baseline | 0.508 | 0.575 | 0.534 |
| MixStyle | 0.476 | — | — |
| MixUp | — | 0.595 | — |
| EFDM | — | 0.582 | 0.540 |
| CCSDG | 0.488 | — | — |
| EM₁ | 0.519 | 0.629 | 0.540 |
| EM₂ | 0.508 | 0.386 | 0.538 |
EM₁ achieves the highest gain: +3.1 points (ADNI), +3.4 (AIBL), +0 (OASIS) over the best competing SDG baseline per cohort, with an average increase of 2.4 percentage points macro-F1 across the three test cohorts. Notably, EM₂ (which includes kurtosis) was less stable, especially in AIBL testing (Batool et al., 4 Jan 2026).
5. Ablation and Empirical Analysis
Experiments reveal optimality when placing EM at encoder Layer 2; use at Layer 1, Layer 3, or multiple layers produced inferior results. EM₁, which perturbs mean, variance, and skewness, offered consistent improvement. EM₂, which further adds kurtosis, occasionally led to instability, particularly on data with smaller sample sizes or greater distributional discrepancy, such as AIBL.
Mixing coefficient () and probability () hyperparameters affect performance: EM₁ benefits from , , while EM₂ prefers a moderate .
6. Interpretation and Domain Generalization Implications
Analysis of feature-space statistics by cohort demonstrated significant variation in skewness and kurtosis, confirming the presence of non-Gaussian and heavy-tailed distributional shifts in sMRI across sites and scanners. By simulating these higher-order moment variations via mixing, EM compels learning of representations robust to asymmetry and tail-behavior, a property validated by t-SNE embeddings that lose cohort clumping progressively more with the use of standard MixStyle, EFDM, and most effectively EM.
Grad-CAM inspection highlights that EM₁’s augmentations direct model attention more consistently to cortical and subcortical regions implicated in AD, supporting the premise that higher-order augmentation aligns with neuroanatomical pathology.
Blending skewness (EM₁) confers the largest and most stable gains in macro-F1 and cross-domain sensitivity, indicating its efficacy for real-world scanner and protocol variability. This suggests that higher-order moment augmentation is a critical lever in simulating realistic distributional shifts in medical imaging applications facing substantial domain fragmentation.
7. Significance and Extensions
Extended MixStyle demonstrates that feature-space augmentation leveraging skewness and kurtosis, in addition to mean and variance, more faithfully emulates real-world cohorts’ statistical heterogeneity, leading to superior single-domain generalization in clinical sMRI-based classification of Alzheimer’s disease. No additional domain-regularization or adversarial objectives are required; EM operates as a standalone feature augmentation. The method is applicable in scenarios typified by non-Gaussian, asymmetric, and heavy-tailed domain shifts. Future research may explore automated hyperparameter selection or integration of further higher-order statistics for domains exhibiting distinct heterogeneity profiles (Batool et al., 4 Jan 2026).