Effect of NOBLE Under Mixup/CutMix in ViT Training
Determine whether augmenting ViT-S/16 with the NOBLE nonlinear low-rank branch provides any consistent benefit when training on ImageNet-1k with Mixup and CutMix augmentation enabled, in terms of training loss and validation accuracy.
References
Meanwhile when training with Mixup/CutMix, it is not clear that NOBLE provides any benefit.
— NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches
(2603.06492 - Smith, 6 Mar 2026) in Image Model Experiments → ViT-S ImageNet Classification, Figure 'vit_results' caption