Unresolved Questions on Neural Scaling Laws

Investigate the unresolved questions regarding neural scaling laws, focusing on clarifying what aspects of the observed performance scaling with training time, dataset size, and model size remain unestablished across architectures and tasks to better understand their theoretical foundations and practical implications.

Background

Neural scaling laws describe how performance metrics such as test loss improve predictably with increases in compute, model size, and dataset size. These laws are central to model and dataset design and to compute-optimal training strategies. Despite extensive empirical study, the introduction notes that many aspects of these laws are still not fully understood, motivating the development of theoretical models to explain and predict these phenomena.

The paper presents a solvable random feature model capturing several observed scaling behaviors, including differing exponents in time versus model size, finite-dataset and finite-width corrections, and the non-equivalence of ensembling and width scaling. Nonetheless, the authors explicitly acknowledge unresolved questions broadly about neural scaling laws.

References

Yet, many questions about neural scaling laws remain open.

— A Dynamical Model of Neural Scaling Laws (2402.01092 - Bordelon et al., 2 Feb 2024) in Introduction (Section 1), page 2

Unresolved Questions on Neural Scaling Laws

Sponsor

Background

References

Related Problems