Prompt-Based Continual Learning

Updated 8 October 2025

Prompt-based continual learning is a paradigm that uses small learnable prompts with a fixed backbone to efficiently adapt to sequential tasks.
It decouples model plasticity from stability by maintaining a dynamic prompt pool that mitigates catastrophic forgetting without full network retraining.
The approach employs instance-wise prompt selection and injection, ensuring competitive performance and scalability across diverse continual learning benchmarks.

Prompt-based continual learning refers to a class of methods in which small sets of learnable prompt parameters, rather than the bulk of model weights, are adapted to capture both task-specific and global (task-invariant) knowledge as a model is exposed to sequential learning tasks. By decoupling plasticity and stability via a prompt pool, these techniques directly address catastrophic forgetting and facilitate scalable, rehearsal-free adaptation on top of large pretrained models. Prompt-based continual learning architectures are typically built on frozen feature extractors—most commonly Vision Transformers (ViT)—and use dynamic instance-wise or distribution-aware mechanisms to select or generate prompt instructions that modulate prediction for each input. The paradigm’s core innovations include memory efficiency, task-agnostic inference, competitive (often SOTA) performance across various continual learning scenarios (class-, domain-, and task-incremental), and extensibility to low-shot and privacy-sensitive domains.

1. Fundamental Principles and Paradigm Shift

Prompt-based continual learning (PCL) rethinks continual adaptation by shifting the locus of adaptation from network weights to compact pools of small learnable parameters—known as prompts. Unlike conventional approaches that either replay data (rehearsal) or expand architectures (dynamic heads, progressive networks), prompt-based methods maintain a fixed pretrained backbone while updating only prompt pools through sequential tasks. The typical workflow involves the following elements:

Prompt Pool (Memory): A set of learnable prompts $\mathcal{P} = \{P_1, \ldots, P_M\}$ , each a sequence of $L_p$ tokens with embedding dimension $D$ (Wang et al., 2021).
Instance-wise Prompt Selection: For each input, a feature extractor produces a query $q(x)$ (typically a [CLS] embedding), which is used to select a subset of prompts via a key-query mechanism—commonly based on cosine similarity or projection (Wang et al., 2021, Hu et al., 2023).
Prompt Injection: The selected prompt(s) are prepended or fused with the input embedding, instructing the frozen backbone to condition processing on the activated knowledge (Wang et al., 2021).
Task-Agnostic Inference: Since prompt selection is instance-driven, no explicit task labels are needed at inference, making PCL robust to task boundary uncertainty and domain shift.

PCL reframes memory as a modular and succinct repository of instructions, supporting both task-invariant (global) and task-specific (expert) knowledge (Hu et al., 2023). Prompt learning can be cast as learning new "experts" in a mixture—often within the self-attention block’s attention mechanism (2405.14124).

2. Overcoming Catastrophic Forgetting: Mechanisms and Loss Formulations

Catastrophic forgetting in continual learning arises when updates for new tasks destroy performance on previously learned ones. PCL addresses this by:

Decoupling Knowledge: The fixed backbone retains general knowledge, while prompts encode and retrieve both past and current knowledge (Wang et al., 2021).
Instance-wise Querying: Dynamic selection of prompts via query-key mechanisms enables retrieval of relevant knowledge without access to explicit task labels, limiting interference with unrelated prompts (Wang et al., 2021, Tran et al., 2023).
Memory Efficiency: Only the prompt pool increases with the number of tasks, which is typically much smaller (<0.1% of backbone parameters) compared to architecture expansion in other dynamic CL methods (Wang et al., 2021).
Regularization and Losses: Regularization terms align prompt keys to instance queries (e.g., cosine similarity), encourage keys for new tasks to be orthogonal to prior queries (orthogonal projection), or utilize language guidance losses rooted in pre-trained LLM spaces (Khan et al., 2023, Tran et al., 2023).
Auxiliary Modules: Incorporating One-Versus-All (OVA) heads, prototype-based losses, or prompt consistency losses further reduces classification confusion and semantic drift (Tran et al., 2023, Gao et al., 13 Mar 2024).
Mixture of Experts and Nonlinear Gating: By interpreting the attention block as a mixture of linear experts and optimizing nonlinear gating mechanisms (NoRGa), PCL architectures improve convergence speed and memory management (2405.14124).

A representative loss formulation is: $\text{Loss} = \mathcal{L}_\text{CE}(g_\phi(\text{avg}(f_r(p))), y) + \lambda \sum_i \gamma(q(x), K_{s_i})$ where $f_r$ is the frozen transformer, $g_\phi$ is the classifier, $\gamma$ is a distance (e.g., cosine) function, and $\lambda$ is a balancing hyperparameter (Wang et al., 2021).

3. Prompt Pool Design, Selection, and Efficiency

Prompt pool management, selection, and efficiency are active research topics in PCL:

Prompt Query Strategies: Early efforts use [CLS] tokens as queries; recent methods leverage intermediate layer embeddings, image tokens, or semantic information for more stable query selection (Kim et al., 25 Feb 2024, Han et al., 18 Mar 2024).
Discrete vs. Continuous Prompts: Methods like VQ-Prompt replace continuous input-conditioned prompts, which lack abstraction, with discrete, vector-quantized prompts optimizable via end-to-end task loss and regularization (Jiao et al., 27 Oct 2024).
Multiple Keys/Queries: MQMK introduces multiple queries (task-specific) and multiple keys (class-specific), enabling precise breadth and depth search, and significantly improves matching accuracy and task alignment (Tu et al., 22 Jan 2025).
Efficient Expansion & Reuse: LW2G and related works introduce dynamic prompt expansion—using a metric such as Hinder Forward Capability (HFC) to determine whether to grow the pool or adapt existing prompts; this prevents unnecessary pool bloat and improves retrieval accuracy (Feng et al., 27 Sep 2024).
Shared and Task-specific Prompts: Methods such as SMoPE structure the prompt pool as a sparse mixture of experts: a shared prompt is partitioned into experts, with input-dependent routing and adaptive noise to balance expert utilization while maintaining efficiency (Le et al., 29 Sep 2025).
Layer-wise and Unified Prompting: Some approaches aggregate prompts across layers into a unified pool to provide cross-layer knowledge and reduce redundancy, which is especially effective in domain-incremental settings (e.g., distributed medical AI) (Oh et al., 14 Aug 2025).

Prompt Selection Strategy	Query Type	Prompt Granularity	Task-agnostic Support	Reference
Key-query matching	Instance/global	Pool	Yes	(Wang et al., 2021)
Language-guided	Language embedding	Pool	Yes	(Khan et al., 2023)
Vector quantization	Instance	Discrete pool	Yes	(Jiao et al., 27 Oct 2024)
Image-token semantic	Token-wise	Weighted fusion	Yes	(Han et al., 18 Mar 2024)
Multiple queries/keys	Task/class-specific	Two-level	Yes	(Tu et al., 22 Jan 2025)

4. Integration with Pretrained Models and New Modalities

Prompt-based continual learning leverages foundation models (vision transformers, CLIP) for robust feature extraction. Key integration aspects include:

Frozen Backbones: Most PCL methods fix the backbone model (e.g., ViT) and only adapt prompts and lightweight classifier heads (Wang et al., 2021, Hu et al., 2023, Kim et al., 25 Feb 2024).
Foundation Models: Prompt tuning leverages high-capacity, pretrained representations to maximize generalization and enable strong few-shot performance, as in POP and knowledge distillation settings (Hu et al., 2023, Zhang et al., 18 Jul 2024).
Prompt-based Knowledge Distillation: Continual Distillation Learning (CDL) augments PCL by distilling teacher prompts into a smaller student via explicit prompt mapping and soft-label KD heads, further improving performance on resource-constrained devices (Zhang et al., 18 Jul 2024).
Adaptation Beyond Vision: While most results are in vision, the conceptual framework extends to other modalities (language, multimodal, speech), as the mechanism is agnostic to input type (Wang et al., 2021, Hu et al., 2023).

5. Performance Metrics, Benchmarks, and Empirical Superiority

Prompt-based continual learning methods are evaluated on standard class-, domain-, and task-incremental benchmarks (e.g., Split CIFAR-100, Split ImageNet-R, CUB-200, CORe50, Omnibenchmark, medical datasets), using metrics such as:

Final Average Accuracy (FAA): Accuracy after the last learning step.
Cumulative Average Accuracy (CAA): Average over all incremental steps.
Forgetting Measure (F): Amount of performance drop on prior tasks.
Prompt Matching Rate, Retrieval Accuracy: For accurate prompt-task alignment (e.g., MQMK improves matching by >30% over baselines) (Tu et al., 22 Jan 2025).
Resource Utilization: Number of learnable parameters, memory cost, inference/training GFLOPs.

Empirical results consistently demonstrate that rehearsal-free PCL methods—especially those with advanced prompt selection (MQMK, VQ-Prompt, SMoPE)—are competitive or superior to rehearsal-based, architectural expansion, and regularization baselines (Wang et al., 2021, Hu et al., 2023, Tran et al., 2023, Gao et al., 13 Mar 2024, Feng et al., 27 Sep 2024, Jiao et al., 27 Oct 2024, Tu et al., 22 Jan 2025, Le et al., 29 Sep 2025).

Method	FAA Improvement (ImageNet-R)	Matching/Rec. Rate	Parameters	Reference
KOPPA	+20% over CODA	N/A	N/A	(Tran et al., 2023)
MQMK	+7–11% over SQSK	+30%	N/A	(Tu et al., 22 Jan 2025)
VQ-Prompt	+3–5% over CODA/L2P	N/A	Lower	(Jiao et al., 27 Oct 2024)
RainbowPrompt	+9% over SOTA	High Diversity	Moderate	(Hong et al., 30 Jul 2025)
DPFormer	Top-1 SOTA on multiple	Unified classifier	Fixed	(Huang et al., 9 Jun 2025)

6. Advanced Themes: Diversity, Scalability, and Security

Recent PCL research addresses key advanced challenges:

Diversity of Prompts: RainbowPrompt introduces prompt-evolving mechanisms to enhance diversity, employing attention-based transformations and probabilistic gates to optimize where in the model evolved prompts are injected (Hong et al., 30 Jul 2025).
Scalability and Pool Expansion: Methods like LW2G use dynamic growing approaches—quantified by metrics like HFC—to automatically determine when to expand or consolidate prompts, thus controlling computational and storage overhead (Feng et al., 27 Sep 2024).
Distributed and Low-shot Scenarios: Unified prompt pools with minimal expansion, as in distributed medical AI, enable sustainable adaptation while reducing inference cost and maintaining privacy constraints (Oh et al., 14 Aug 2025).
Security Threats: Although prompt-based continual learning provides privacy benefits, its memory mechanisms are vulnerable to backdoor attacks (AOP), with adversaries leveraging prompt selection to implant persistent triggers that survive across tasks—even under black-box, data-private conditions. Defenses remain an area for future research (Nguyen et al., 28 Jun 2024).

7. Limitations, Open Challenges, and Future Directions

Despite substantial progress, prompt-based continual learning faces multiple open challenges:

Prompt Selection Accuracy: Improving alignment between prompts and current data distributions remains a key focus. MQMK and multi-key techniques improve selection rates, but accuracy can still be suboptimal under high task heterogeneity (Tu et al., 22 Jan 2025).
Inference Efficiency: Approaches that require multiple forward passes (e.g., MQMK, dual-stage selection) face scalability bottlenecks for large task pools and real-time constraints.
Prompt Pool Management: Dynamic pool expansion and redundancy control (as in LW2G, unified pools) are critical for sustained efficiency (Feng et al., 27 Sep 2024, Oh et al., 14 Aug 2025).
Application to Other Modalities: Transferring these successes to text, audio, or multimodal systems is promising but largely untested at the scale shown in vision.
Security and Privacy: Developing robust defenses in prompt-based systems—especially for online, distributed, or privacy-sensitive deployments—is an emergent requirement (Nguyen et al., 28 Jun 2024).
Automated Prompt Design: Further research is needed into the automated construction and adaptation of the prompt pool (e.g., via reinforcement/meta-learning) and dynamic gating (Hong et al., 30 Jul 2025, Feng et al., 27 Sep 2024).

Potential future directions highlighted include exploration in more realistic, blurry, or non-i.i.d. settings (general continual learning), automatic taxonomy construction for label organization (Tran et al., 6 Oct 2024), more sophisticated expert selection or routing in MoE/PCL hybrids (Le et al., 29 Sep 2025), improved language and semantic guidance (Khan et al., 2023), and broader application to distributed/medical AI settings (Oh et al., 14 Aug 2025).

Prompt-based continual learning offers a memory-efficient, flexible, and high-performing foundation for continual adaptation over complex, evolving task sequences. By modularizing knowledge into learnable prompts and integrating mechanisms for dynamic, instance-wise selection and allocation, PCL methods address catastrophic forgetting, scalability, and practical deployment pressures. The field continues to evolve rapidly, with current research pushing toward enhanced diversity, security, task-agnostic inference, and unified solutions across modalities and domains.