Recovering Plasticity of Neural Networks via Soft Weight Rescaling (2507.04683v1)

Published 7 Jul 2025 in cs.LG

Abstract: Recent studies have shown that as training progresses, neural networks gradually lose their capacity to learn new information, a phenomenon known as plasticity loss. An unbounded weight growth is one of the main causes of plasticity loss. Furthermore, it harms generalization capability and disrupts optimization dynamics. Re-initializing the network can be a solution, but it results in the loss of learned information, leading to performance drops. In this paper, we propose Soft Weight Rescaling (SWR), a novel approach that prevents unbounded weight growth without losing information. SWR recovers the plasticity of the network by simply scaling down the weight at each step of the learning process. We theoretically prove that SWR bounds weight magnitude and balances weight magnitude between layers. Our experiment shows that SWR improves performance on warm-start learning, continual learning, and single-task learning setups on standard image classification benchmarks.

Summary

The paper demonstrates that Soft Weight Rescaling effectively bounds weight growth to recover neural network plasticity.
It introduces a minimal-cost, dynamic scaling approach that preserves learned features during continual and warm-start learning.
Experimental evaluations on benchmarks like CIFAR-10 reveal significant improvements in generalization compared to traditional methods.

Recovering Plasticity of Neural Networks via Soft Weight Rescaling

Introduction

The paper "Recovering Plasticity of Neural Networks via Soft Weight Rescaling" addresses the challenge of plasticity loss in neural networks, where the model's capacity to learn new information diminishes as training progresses. Unbounded weight growth has been identified as a main contributor to this issue, adversely affecting optimization dynamics and generalization. The proposed solution, Soft Weight Rescaling (SWR), aims to mitigate unbounded weight growth without sacrificing the learned information, thereby restoring network plasticity.

Soft Weight Rescaling Methodology

Soft Weight Rescaling (SWR) operates by proportionally scaling down the weights throughout the learning process. Unlike traditional regularization methods that often involve computational overhead and potential conflicts with optimization objectives, SWR implements a minimal-cost solution by applying dynamic scaling factors to maintain weight magnitudes within bounds. By leveraging properties of neural network proportionality, SWR ensures that the scaling of weights preserves network behavior and performance, which is crucial for tasks like warm-start learning, continual learning, and conventional single-task learning.

Theoretical Insights

The theoretical framework provided in the paper demonstrates that SWR effectively bounds the weight magnitude and ensures equilibrium between layer magnitudes. This scaling technique builds on the understanding that maintaining consistent proportionality, even across varying weights in different layers, prevents the type of weight divergence that leads to reduced plasticity. The paper's theorem proves that employing SWR can construct networks that maintain their functional behavior while limiting excessive parameter growth.

Experimental Evaluation

Experiments conducted on standard image classification benchmarks, such as CIFAR-10 and TinyImageNet using architectures like VGG-16, illustrate SWR's superiority over existing methods. SWR consistently enhances generalization and mitigates plasticity loss compared to traditional L2 regularization or alternative re-initialization strategies. Noteworthy results indicate significant performance improvements, especially in warm-start and continual learning setups, where conventional methods typically struggle with the retention of previously learned information.

Implementation Considerations

The implementation of SWR requires calculating the change in the Frobenius norm of weight matrices and adjusting scaling factors via exponential moving averages to ensure bounded weight growth. Importantly, SWR is applicable across various network architectures and can integrate with prevalent normalization layers like batch normalization, preserving model proportionality without degrading performance. This makes SWR a pragmatic choice for deployment in real-world continual learning applications where robustness and adaptability are key.

Conclusion

Soft Weight Rescaling offers an effective strategy for recovering and maintaining the plasticity of neural networks without the drawbacks associated with weight re-initialization or heavy-handed regularization methods. By ensuring bounded weight growth and maintaining layer balance, SWR facilitates improved generalization and stability across diverse learning scenarios. Future avenues may explore scaling techniques proportional to learning progress for large-scale models or adaptively integrating SWR with dynamic weight adjustment strategies in deep learning frameworks.