When Neural Networks Stop Learning: The Plasticity Trap
This presentation explores a fundamental problem in deep learning: why do neural networks lose their ability to adapt when the world changes? We examine how the very mechanisms that help networks generalize in static environments—low-rank structures and redundant representations—become barriers that trap gradient dynamics and prevent learning in evolving tasks. Through a dynamical systems lens, we'll see how this 'loss of plasticity' emerges and what it means for building truly adaptive AI.Script
Neural networks excel at learning static tasks, but something curious happens when the world changes around them. They freeze, unable to adapt, even though their learning algorithm keeps running. This phenomenon, called loss of plasticity, turns out to be a mathematical trap hidden in the very structure that made them successful.
The researchers frame plasticity loss through dynamical systems theory. During training, gradient descent doesn't explore the full parameter space freely. Instead, it gets trapped in what they call loss-of-plasticity manifolds, special subspaces where the gradients can only move in limited directions, like a train confined to its tracks.
Two mechanisms create these traps. First, frozen units from activation saturation, where neurons get stuck at extreme values and stop contributing meaningful gradients. Second, cloned unit manifolds from representational redundancy, where multiple neurons learn identical features. Both restrict where gradients can flow during optimization.
Here's the paradox. The same low-rank structures that help networks generalize efficiently on static tasks become barriers when tasks evolve. Through cloning experiments across multiple architectures, including vision transformers and residual networks, the authors demonstrate that mechanisms optimized for static performance systematically trap learning dynamics.
The empirical validation is striking. In bit-flipping benchmarks where the task continuously changes, networks without interventions progressively lose rank and accumulate dead units. The gradient flow becomes increasingly confined, not because of catastrophic forgetting of old tasks, but because the optimization geometry itself has closed off paths to new solutions.
This work reveals that building adaptive AI requires rethinking the tension between static optimization and continual learning. Understanding plasticity loss as a geometric trap opens new paths for designing interventions that keep networks explorable. To dive deeper into how neural networks learn and adapt, visit EmergentMind.com where you can explore this research and create your own explanatory videos.