Noradrenergic-inspired gain modulation attenuates the stability gap in joint training

Published 18 Jul 2025 in cs.LG, cs.AI, and q-bio.NC | (2507.14056v1)

Abstract: Recent studies in continual learning have identified a transient drop in performance on mastered tasks when assimilating new ones, known as the stability gap. Such dynamics contradict the objectives of continual learning, revealing a lack of robustness in mitigating forgetting, and notably, persisting even under an ideal joint-loss regime. Examining this gap within this idealized joint training context is critical to isolate it from other sources of forgetting. We argue that it reflects an imbalance between rapid adaptation and robust retention at task boundaries, underscoring the need to investigate mechanisms that reconcile plasticity and stability within continual learning frameworks. Biological brains navigate a similar dilemma by operating concurrently on multiple timescales, leveraging neuromodulatory signals to modulate synaptic plasticity. However, artificial networks lack native multitimescale dynamics, and although optimizers like momentum-SGD and Adam introduce implicit timescale regularization, they still exhibit stability gaps. Inspired by locus coeruleus mediated noradrenergic bursts, which transiently enhance neuronal gain under uncertainty to facilitate sensory assimilation, we propose uncertainty-modulated gain dynamics - an adaptive mechanism that approximates a two-timescale optimizer and dynamically balances integration of knowledge with minimal interference on previously consolidated information. We evaluate our mechanism on domain-incremental and class-incremental variants of the MNIST and CIFAR benchmarks under joint training, demonstrating that uncertainty-modulated gain dynamics effectively attenuate the stability gap. Finally, our analysis elucidates how gain modulation replicates noradrenergic functions in cortical circuits, offering mechanistic insights into reducing stability gaps and enhance performance in continual learning tasks.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces uncertainty-modulated gain dynamics to mitigate the transient stability gap in joint continual learning.
It decomposes weight updates into slow tonic and fast phasic components, dynamically scaling updates based on output entropy.
Empirical results on MNIST and CIFAR demonstrate reduced performance dips and improved final accuracy over standard optimizers.

Noradrenergic-Inspired Gain Modulation for Stability in Joint Continual Learning

This paper addresses a persistent challenge in continual learning (CL): the "stability gap," a transient drop in performance on previously mastered tasks when assimilating new ones, even under ideal joint training. The authors propose a biologically inspired mechanism—uncertainty-modulated gain dynamics, modeled after noradrenergic bursts in the brain—to attenuate this gap by dynamically balancing plasticity and stability during learning.

Theoretical Framework

The stability gap is shown to be a fundamental property of sequential optimization, not merely an artifact of imperfect memory or regularization. Even with access to all previous data (joint training), standard optimizers such as momentum-SGD (MSGD) and Adam exhibit transient forgetting at task boundaries. This phenomenon is attributed to the lack of explicit multi-timescale dynamics in artificial neural networks, in contrast to biological systems where neuromodulatory signals (notably noradrenaline from the locus coeruleus) transiently boost neuronal gain in response to uncertainty, facilitating rapid adaptation without erasing consolidated memories.

The authors formalize gain modulation as a two-timescale optimization scheme. The effective synaptic weight is decomposed into a slow component (tonic gain) and a fast component (phasic gain), with the latter triggered by uncertainty (quantified as output entropy). This mechanism is implemented by modulating the gain parameter of neurons in response to the entropy of the network's output distribution, thereby scaling the magnitude of weight updates in a context-sensitive manner.

Algorithmic Implementation

The Noradrenergic Gain-Modulated SGD (NGM-SGD) algorithm augments standard SGD with a dynamic gain parameter $g$ for each neuron (or output unit in CNNs), updated as follows:

Weight update:

$w \leftarrow w - \alpha \nabla_w L$

Gain update:

$g \leftarrow \gamma g + (1-\gamma)g_0 + \eta H$ where $H$ is the entropy of the output distribution, $\gamma$ is the decay rate, $g_0$ is the baseline gain, and $\eta$ scales the phasic response.

In practice, gain modulation is applied to all neurons in MLPs and to the output layer in CNNs (e.g., ResNet-18), reflecting biological constraints and empirical findings that the output layer is a primary locus of the stability gap.

Empirical Results

Experiments are conducted on both class-incremental and domain-incremental variants of MNIST and CIFAR benchmarks under joint training. The evaluation uses continual metrics: average final accuracy (avg-ACC), average minimum accuracy (avg-min-ACC), worst-case accuracy (WC-ACC), and average stability gap (avg-SG).

Key findings include:

Consistent reduction of the stability gap:

NGM-SGD achieves the lowest avg-SG across all benchmarks, with particularly strong effects in class-incremental settings (e.g., Split MNIST: avg-SG of 0.019 vs. 0.270 for MSGD and 0.155 for Adam).

Improved or comparable final accuracy:

NGM-SGD matches or exceeds the final accuracy of standard optimizers, especially in more complex settings (e.g., Split CIFAR-10: 90.1% for NGM-SGD vs. 87.5% for MSGD).

Faster recovery and smaller dips at task transitions:

Training curves show that NGM-SGD minimizes the depth and duration of performance drops at task boundaries.

Gain dynamics reflect task complexity:

The evolution of the gain parameter tracks the perceived complexity and uncertainty of tasks, rising at the onset of new tasks and decaying as learning stabilizes.

Implications

Practical Implications:

Robust continual learning:

The proposed gain modulation mechanism can be integrated into existing architectures with minimal overhead, requiring only the addition of a gain parameter and its update rule. This is especially relevant for safety-critical applications where transient failures are unacceptable.

Optimizer-agnostic enhancement:

NGM-SGD can be layered atop standard optimizers, providing an orthogonal mechanism for stability without altering the underlying optimization trajectory.

Scalability:

The method is computationally lightweight, as gain updates are local and require only the computation of output entropy per batch.

Theoretical Implications:

Biological plausibility:

The work strengthens the connection between neuromodulatory mechanisms in the brain and optimization strategies in artificial networks, suggesting that explicit multi-timescale dynamics are essential for robust lifelong learning.

Loss landscape geometry:

Transient gain boosts are shown to flatten the loss landscape at task boundaries, facilitating smoother transitions and reducing interference, challenging the assumption that monotonic loss minimization is optimal in CL.

Limitations and Future Directions

Benchmark scope:

Experiments are limited to MNIST and CIFAR variants with a small number of tasks. Extension to more complex, real-world continual learning scenarios is necessary.

Joint training idealization:

The joint training setup isolates the stability gap but does not capture the full complexity of online or task-incremental learning.

Extension to other neuromodulatory systems:

The authors suggest exploring cholinergic modulation for task segregation and gain gating as a regularization mechanism.

Speculation on Future Developments

The integration of biologically inspired gain modulation into deep learning optimizers may become a standard approach for mitigating transient forgetting in continual learning. Further research could explore adaptive gain mechanisms in more complex architectures (e.g., transformers, large-scale vision models) and in reinforcement learning settings. Additionally, combining gain modulation with other regularization and memory consolidation strategies may yield synergistic improvements in stability and plasticity trade-offs.

Conclusion

This work demonstrates that noradrenergic-inspired, uncertainty-modulated gain dynamics provide an effective and biologically grounded solution to the stability gap in joint continual learning. By approximating a two-timescale optimizer and dynamically reshaping the loss landscape, gain modulation enables rapid adaptation with minimal interference, offering both practical and theoretical advances in the design of robust lifelong learning systems.

Markdown Report Issue