Stability–Plasticity Dilemma
- The stability–plasticity dilemma is the trade-off between acquiring new information (plasticity) and retaining established knowledge (stability) to prevent catastrophic forgetting.
- It is formalized using multi-objective optimization techniques that quantify forgetting and retention through metrics such as backward and forward transfer.
- Architectural and algorithmic strategies, including null-space methods and branch-tuning, are employed to balance these competing demands in continual learning systems.
The stability–plasticity dilemma describes the fundamental challenge in continual, lifelong, and incremental learning: balancing a system’s ability to rapidly acquire new information (plasticity) without catastrophically overwriting established knowledge (stability). This phenomenon is universal across artificial neural networks, reinforcement learning agents, dense prediction pipelines, and even biological circuits. The dilemma manifests as a trade-off: unconstrained learning enables swift adaptation but induces catastrophic forgetting, while overly conservative update rules preserve history at the expense of ongoing learning capacity. Formally, it is a central constraint in the design and analysis of algorithms for continual, online, and class-incremental learning.
1. Formalization and Quantification of the Stability–Plasticity Dilemma
Mathematically, the stability–plasticity dilemma is expressed through the evolution of a model’s loss or accuracy on sequences of tasks . As each new task arrives, plasticity is the ability to drive loss low on task , while stability is the ability to keep prior losses approximately unchanged.
The dilemma is often captured through multi-objective or Pareto optimization:
where quantifies forgetting on old task after learning task (Spotorno et al., 29 Jan 2026, Liu et al., 2024). Scalarizations, such as , operationalize the trade-off for algorithm design (Lai et al., 30 Mar 2025). In self-supervised learning, representation-similarity metrics such as Centered Kernel Alignment (CKA) quantify how much extracted features are preserved (stability) or shift (plasticity) across incremental steps (Liu et al., 2024, Kim et al., 2023).
Key metrics include:
| Property | Quantification (Typical) |
|---|---|
| Stability | Backward transfer (BWT): |
| Plasticity | Forward transfer (FWT): |
| Joint | Average incremental/test accuracy; multi-objective accuracy and forgetting trade-off |
Here is accuracy on task after learning tasks, is accuracy on just after it is learned, and is accuracy when trained from scratch.
In reinforcement learning, plasticity loss is quantified by the difference in return between a plastic (freshly-initialized) agent and one that has undergone prolonged training: , where denotes expected return (Maheshwari et al., 30 Nov 2025).
2. Architectural and Algorithmic Manifestations
The capacity and dynamics of the underlying architecture determine the attainable stability–plasticity equilibrium.
- Depth vs. Width: Deeper networks are empirically more plastic, while wider networks yield greater stability under parameter-equalized constraints (Lu et al., 4 Jun 2025).
- Branching and Adapters: Splitting the model into a shared backbone and task-specific adapters enables the backbone to accumulate invariant knowledge (stability) while adapters absorb task-specific shifts (plasticity) (Wang et al., 8 Mar 2025).
- Null-space and Subspace Methods: Advanced Null Space approaches project gradients into low-rank subspaces orthogonal to previous task data, explicitly controlling allowable deviation for plasticity (Kong et al., 2022, Lin et al., 2021).
Several algorithmic paradigms exhibit distinctive stability–plasticity regimes:
| Method Paradigm | Plasticity Impact | Stability Impact | Notable Examples |
|---|---|---|---|
| Experience Replay | High (if buffer is sufficient) | Moderate–High | ParetoCL (Lai et al., 30 Mar 2025), SyReM (Lin et al., 27 Aug 2025) |
| Regularization (EWC, SI, etc.) | Moderate | High (may suppress plasticity) | EWC (Zou et al., 3 Feb 2025), AdNS (Kong et al., 2022) |
| Architectural Expansion/Branching | Very High (new params) | Very High (isolation per task) | Branch-Tuning (Liu et al., 2024), DER, AdaLL (Wang et al., 8 Mar 2025) |
| Modular/Library Sovereignty | Very High (plasticity via switching) | Absolute (frozen specialists) | HYDRA (Spotorno et al., 29 Jan 2026) |
3. Theoretical Analyses and Capacity Dynamics
Recent work formalizes stability–plasticity via effective model capacity, showing that in any non-stationary continual learning regime, a neural network’s ability to represent both past and new tasks is inherently non-stationary (Chakraborty et al., 11 Aug 2025). The Continual Learning Effective Model Capacity (CLEMC):
drifts upward (capacity deteriorates) under continual distributional shift, regardless of architecture or optimizer. Weighted loss regularization and replay can slow but not halt this drift; model expansion or adaptation is required for long-term balance (Chakraborty et al., 11 Aug 2025). This formalizes why replay, regularization, and fixed-parameter strategies never fully resolve the dilemma.
In null-space approaches, the rank (dimension) of the projected subspace directly modulates the trade-off:
- Larger null space: more plasticity, less stability.
- Smaller null space: less plasticity, more stability.
Similar trade-offs exist for the number of branching parameters or width of adapters (Kong et al., 2022, Wang et al., 8 Mar 2025).
4. Algorithmic Strategies: Modulating and Decoupling the Trade-off
Gradient Projection and Null Space Approaches
Advanced Null Space (AdNS) projects increments into shared low-rank subspaces while tightening constraints as tasks accumulate, implementing a non-uniform interference bound to interpolate between full stability and unrestricted plasticity (Kong et al., 2022, Lin et al., 2021). Linear connectors blend stability-oriented and plasticity-oriented optima via explicit interpolation in parameter space, controlling the trade-off via a convex combination parameter (Lin et al., 2021).
Multi-Objective and Preference-Conditioned Optimization
Pareto Continual Learning (ParetoCL) recasts stability and plasticity as multi-objective criteria, learning a continuum of models parameterized by trade-off preferences (e.g., weighting for new vs. prior data). At inference, the most confident prediction under the learned Pareto front is selected per sample (Lai et al., 30 Mar 2025).
Module Specialization and Modular Sovereignty
The HYDRA paradigm solves the dilemma by assembling frozen libraries of regime-specific specialist networks, blended online via uncertainty-aware gating, eliminating catastrophic forgetting by design (Spotorno et al., 29 Jan 2026). Similarly, Dual-Arch uses independent specialist networks for stability and plasticity, trained sequentially with knowledge distillation (Lu et al., 4 Jun 2025).
Replay and Sample Selection
Selective rehearsal (e.g., SyReM) elevates plasticity while maintaining buffer-enforced stability by replaying only those memory samples whose gradients are maximally aligned with the current data, enforced by explicit gradient projection constraints (Lin et al., 27 Aug 2025).
Neuron-level Control and Fine-Grained Modulation
Neuron-level strategies, such as gradient masking over skill neurons in RL, further refine the balance by targeting stability only for neurons empirically found to be critical for previously acquired skills, maintaining global plasticity elsewhere (Lan et al., 9 Apr 2025).
5. Empirical and Domain-Specific Manifestations
- Self-supervised vision: Freezing BatchNorm layers implements stability; tuning only convolutional layers offers plasticity. Branch-tuning isolates new information to trainable “branch” kernels before merging, achieving near-optimal stability/plasticity (Liu et al., 2024).
- Class-incremental Learning: Most methods overemphasize stability, often leaving feature extractors functionally unchanged across tasks, causing a lack of genuine plastic feature acquisition. Representation analysis via linear-probe retraining and CKA exposes this phenomenon (Kim et al., 2023).
- Reinforcement Learning: Alternating twin network resets (AltNet) restores plasticity without catastrophic performance drops, whereas single-network resets degrade stability (Maheshwari et al., 30 Nov 2025). Neural architectures inspired by the fly olfactory circuit employ sparse expansion, high-dimensional mixing, and winner-take-all coding to enhance both stability and plasticity (Zou et al., 3 Feb 2025).
- Adaptive Control and CPS: In certifiable cyber-physical systems, modular sovereignty offers robust guarantees against catastrophic instability, as frozen modules ensure regime-specific retention and online blending provides necessary adaptation (Spotorno et al., 29 Jan 2026).
6. Limitations, Open Problems, and Future Directions
Despite algorithmic advances, the stability–plasticity dilemmas remain only partially resolved. Empirical and theoretical results indicate that:
- Model capacity must be treated as an evolving, not fixed, resource; remodeling or expansion may be needed in highly non-stationary regimes (Chakraborty et al., 11 Aug 2025).
- Explicit, adaptive trade-off mechanisms—branch size in branching, null-space rank, preference conditioning—require further study and dynamic tuning (Kong et al., 2022, Lai et al., 30 Mar 2025).
- Many current benchmarks and evaluation metrics can be gamed by pathological solutions that freeze large portions of the model, achieving apparent stability at the expense of meaningful learning (Kim et al., 2023).
- Domain transfer (class-incremental segmentation, named-entity recognition, motion forecasting) raises new challenges for expressing and quantifying stability and plasticity due to output granularity, dynamic label semantics, or real-time requirements (Li et al., 2024, Zhang et al., 5 Aug 2025, Lin et al., 27 Aug 2025).
- Modular and library-based methods trade off parameter footprint and latency against provable absence of forgetting, requiring coordinated research on uncertainty decomposition, gating, and model pruning (Spotorno et al., 29 Jan 2026).
Emerging research aims to unify capacity-aware control loops, architectural modularization, and adaptive regularization into a coherent methodology for continually reconciling stability and plasticity under application- and safety-driven constraints.
7. Summary Table: Key Aspects Across Domains
| Aspect | Vision (SSL, CIL) | RL/Control | Dense Prediction/NLP | Theory & Generalization |
|---|---|---|---|---|
| Main Metrics | ACC, BWT, FWT, CKA | Episodic return, FM, FWT, skill retention | mIoU, F1(old/new), FM | CLEMC, differential equations |
| Core Mechanisms | Branch tuning, null space, adapters | Twin network resets, neuron masking, CDE | Loss constraints, module fusion | Dynamic capacity, Pareto fronts |
| Notable Limits | Over-stabilization, parameter bloat | Reset instability, capacity drift | Label drift, buffer limits | Capacity divergence, design trade-offs |
| Exemplars | (Liu et al., 2024Kong et al., 2022Wang et al., 8 Mar 2025) | (Maheshwari et al., 30 Nov 2025Lan et al., 9 Apr 2025Jaziri et al., 2024) | (Li et al., 2024Zhang et al., 5 Aug 2025) | (Chakraborty et al., 11 Aug 2025Lai et al., 30 Mar 2025Spotorno et al., 29 Jan 2026) |
The stability–plasticity dilemma remains the defining constraint for scalable, general continual learning, motivating ongoing innovations in algorithmic modulation, modular architectural design, and system-level certification.