Effect of NOBLE Under Mixup/CutMix in ViT Training

Determine whether augmenting ViT-S/16 with the NOBLE nonlinear low-rank branch provides any consistent benefit when training on ImageNet-1k with Mixup and CutMix augmentation enabled, in terms of training loss and validation accuracy.

Background

In ViT-S ImageNet experiments, the authors observe that NOBLE improves training loss when Mixup/CutMix is disabled but the benefit is unclear when these augmentations are enabled.

The figure caption explicitly states uncertainty about NOBLE’s benefit under Mixup/CutMix, highlighting an unresolved question about NOBLE’s effectiveness under aggressive augmentation regimes.

References

Meanwhile when training with Mixup/CutMix, it is not clear that NOBLE provides any benefit.

NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches  (2603.06492 - Smith, 6 Mar 2026) in Image Model Experiments → ViT-S ImageNet Classification, Figure 'vit_results' caption