Papers
Topics
Authors
Recent
2000 character limit reached

A Two-Phase Perspective on Deep Learning Dynamics (2504.12700v1)

Published 17 Apr 2025 in hep-th, cond-mat.dis-nn, and cs.LG

Abstract: We propose that learning in deep neural networks proceeds in two phases: a rapid curve fitting phase followed by a slower compression or coarse graining phase. This view is supported by the shared temporal structure of three phenomena: grokking, double descent and the information bottleneck, all of which exhibit a delayed onset of generalization well after training error reaches zero. We empirically show that the associated timescales align in two rather different settings. Mutual information between hidden layers and input data emerges as a natural progress measure, complementing circuit-based metrics such as local complexity and the linear mapping number. We argue that the second phase is not actively optimized by standard training algorithms and may be unnecessarily prolonged. Drawing on an analogy with the renormalization group, we suggest that this compression phase reflects a principled form of forgetting, critical for generalization.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.