SVSL can improve end-of-training test performance

Ascertain whether, with proper hyperparameter tuning, the Stochastic Variability-Simplification Loss improves End-of-Training test metrics compared to vanilla cross-entropy on the image datasets MNIST, Fashion-MNIST, STL-10, CIFAR-10, CIFAR-100 and the GLUE tasks CoLA (Matthews correlation), RTE, MRPC, and SST-2.

Background

Beyond geometric effects, the authors posit that SVSL can also enhance generalization performance. SVSL augments cross-entropy with a layerwise clustering penalty designed to reduce within-class variability in intermediate representations, potentially leading to better test metrics.

This conjecture specifically asserts improved End-of-Training test metrics across a suite of standard vision and NLP sequence-classification datasets when using SVSL with appropriately tuned hyperparameters.

References

Conjecture[SVSL can improve test-performance] The EOT test metrics are improved for all datasets using the SVSL and proper hyperparameter tuning.

Nearest Class-Center Simplification through Intermediate Layers  (2201.08924 - Ben-Shaul et al., 2022) in Section 4.2 (Decreasing NCC Mismatch using Stochastic Variability-Simplification Loss)