Extent and origins of the power-law scaling between minima volume and dataset size

Determine whether the empirically observed power-law relationship between the basin volume of training-loss minima and dataset size for image classification models (e.g., MNIST, CIFAR10, SVHN, Fashion MNIST) persists beyond the three orders of magnitude studied, and ascertain whether this scaling is connected to neural scaling laws or to the manifold hypothesis. Use the same basin volume notion as defined by Monte Carlo star-convex estimation under a fixed training-loss threshold to ensure comparability.

Background

The paper reports that for several image classification tasks, the measured basin volume of minima found by training exhibits a power-law relationship with dataset size over three orders of magnitude. This observation is consistent across multiple architectures and optimizers, though the precise scaling coefficient varies.

The authors note that the empirical trend might be related to known neural scaling laws or the manifold hypothesis, but the scope of their experiments is limited. They explicitly state uncertainty about whether the observed power law extends to larger ranges or has a principled connection to these theories.

References

It is unclear from our experiments if this trend holds across more than the 3 orders of magnitude observed and if it has any relation to neural scaling laws or the manifold hypothesis.

Sharp Minima Can Generalize: A Loss Landscape Perspective On Data (2511.04808 - Fan et al., 6 Nov 2025) in Minima Volume Results — Larger Datasets Are Problem-Dependent