Learning to Prompt (L2P) in Continual Learning
- Learning to Prompt (L2P) is a paradigm that uses a pool of learnable prompt tokens to adapt frozen models to sequential tasks.
- Its methodology dynamically selects prompts via an instance-wise query mechanism, eliminating the need for explicit task identifiers.
- L2P achieves continual learning excellence by reducing computational overhead and mitigating catastrophic forgetting in diverse scenarios.
Learning to Prompt (L2P) is a paradigm that frames the adaptation of large, pre-trained models to new tasks through the optimization of a small set of learnable input tokens—known as prompts—rather than by altering the model’s weights or maintaining rehearsal buffers. In continual learning, L2P is distinguished by its ability to address catastrophic forgetting while sidestepping the many of the computational, privacy, and inference overheads inherent in traditional methods reliant on rehearsal or explicit task identity. The core technical contributions reside in constructing a prompt memory system, designing an instance-wise query mechanism for prompt selection, and empirically establishing the effectiveness of prompt-based adaptation on a variety of continual learning scenarios.
1. Motivations and Conceptual Innovation
The foundational motivation behind L2P is to enable large, frozen pre-trained models (e.g., vision Transformers) to learn sequential tasks without catastrophic forgetting by leveraging a pool of small, learnable prompt tokens. These prompts, acting as “instructions,” adapt the context in which the model interprets inputs. Unlike prior approaches that rely on modifying model parameters (e.g., fine-tuning) or replaying stored examples, L2P proposes an alternative memory system: a prompt pool functioning as a succinct, differentiated episodic memory. The instance-wise selection of prompts ensures that knowledge retained in the pool is dynamically and adaptively injected into the fixed backbone model, without the need for explicit task identifiers or label boundaries, which is particularly advantageous in task-agnostic and real-world non-stationary environments (Wang et al., 2021).
2. Methodological Framework
The L2P methodology centers on four main elements:
- Prompt Pool Construction: The method maintains a set of M learnable prompts, , where each . Here, represents the token length per prompt and is the embedding dimension aligning with the pre-trained model’s input space.
- Input Composition and Forward Pass: For each input sample , the model obtains its embedding . L2P then concatenates a dynamically selected subset of prompts to this embedding:
This composite is forwarded through the frozen pre-trained Transformer.
- Prompt Query Mechanism: Each prompt is associated with a key vector. The model computes a query feature (often the [class] token or a function of the input), and the selection of the top- prompts is formalized as:
where is a similarity function, typically cosine distance.
- Optimization Objective: Learning proceeds by minimizing a joint loss:
where is a lightweight classifier, and the second term regularizes the alignment between prompts’ keys and the input’s query feature. balances classification and prompt matching.
3. Memory System, Knowledge Sharing, and Plasticity
L2P’s prompt pool replaces rehearsal buffers by storing learnable prompt tokens instead of high-dimensional sample points. This memory is substantially more compact, with the cumulative parameter count typically amounting to less than the size of a single image for standard vision benchmarks. The instance-wise prompt selection encourages “content-addressable” retrieval: similar inputs reuse overlapping prompts, promoting knowledge sharing, while dissimilar tasks can be disentangled by divergent prompt assignments—explicitly handling both task-invariant and task-specific knowledge. As a result, the system increases plasticity (maintaining adaptability to new tasks) while minimizing interference.
4. Empirical Evaluation
L2P was extensively validated on multiple continual learning scenarios:
- Class-Incremental Learning: Tested on Split CIFAR-100 (10 tasks, no task labels at test time) and 5-Datasets (sequentially presenting CIFAR-10, MNIST, Fashion-MNIST, SVHN, notMNIST).
- Domain-Incremental Learning: Conducted on CORe50, where identical classes are presented in varying domains.
- Task-Agnostic Learning: Assessed using Gaussian Scheduled CIFAR-100, introducing gradual data distribution shifts without task boundaries.
Key findings:
Scenario | L2P compared to SOTA | Notable Metrics |
---|---|---|
No rehearsal buffer | Outperforms EWC, LwF | Higher average accuracy |
With buffer | Matches rehearsal methods | Lower forgetting rates |
Task-agnostic | Directly applicable | Robust to no task labels |
L2P’s static pool with dynamic querying competes with rehearsal and regularization methods, and even surpasses architecture-based approaches (e.g., SupSup, DualNet) by maintaining higher relative performance drops compared to fully supervised upper bounds.
5. Task-Agnostic and Instance-wise Adaptation
A distinctive capability of L2P is effective operation in task-agnostic continual learning, where task boundaries are absent. The instance-wise query approach—eschewing task-identity at both training and inference—enables seamless adaptation as the environment shifts, accumulating and sharing episodic knowledge on the fly. This positions L2P as a strong baseline for real-world continual learning deployments where task segmentation is not available, preventing both catastrophic forgetting and spurious overfitting to hidden task boundaries.
6. Implications, Limitations, and Future Research
L2P’s innovations suggest prompt-based memory is a viable alternative to data rehearsal and full model adaptation for continual learning tasks. Its design questions the necessity of maintaining large sample buffers, task-specific modules, or heavyweight network reconfiguration for each new task. Potential areas for future investigation include:
- Extending the framework beyond vision to NLP, audio, or multi-modal architectures.
- Evaluating hybrid designs, e.g., combining prompts with small rehearsal buffers (as in L2P-R).
- Addressing inherited biases and robustness issues that stem from frozen backbone models.
- Advancing benchmarks to capture more realistic, continuous distributional shifts.
The approach may also serve as a foundation for parameter-efficient, scalable continual learners in privacy-sensitive, resource-constrained, or federated learning scenarios.
7. Summary and Perspectives
Learning to Prompt (L2P) establishes a new theoretical and practical direction in continual learning by introducing a query-driven, prompt pool memory system that obviates the need for explicit task identifiers and rehearsal data. Through dynamic, instance-sensitive prompt retrieval and strict parameter efficiency, L2P achieves state-of-the-art results across class-incremental, domain-incremental, and task-agnostic learning settings, demonstrating its potential for deployment in complex, non-stationary environments. Its influence is poised to extend as research adapts the prompt memory concept to more modalities, architectures, and real-world challenges.