Presence of volume–data scaling structure in language modeling
Investigate whether the strong bias toward flat minima and the power-law relationship between minima basin volume and dataset size observed in image classification tasks also appear in language modeling, and characterize any similarities or differences across domains.
Sponsor
References
It is unknown whether similar structure appears in domains such as language modeling.
— Sharp Minima Can Generalize: A Loss Landscape Perspective On Data
(2511.04808 - Fan et al., 6 Nov 2025) in Conclusion