Dynamic Prompting Without Rehearsal Buffers

Updated 17 July 2025

Dynamic prompting without rehearsal buffers is a continual learning method that leverages adaptable prompt parameters to dynamically guide frozen model backbones.
It employs prompt pools with key-value memory and attention-based selection to integrate instance- or task-specific instructions effectively.
Recent research demonstrates these methods can rival traditional rehearsal approaches, delivering improved stability-plasticity trade-offs with lower memory and computation costs.

Dynamic prompting without rehearsal buffers is an approach in continual learning that enables models to acquire new knowledge over evolving data streams while preserving existing skills, all without retaining previous training instances or relying on external rehearsal memory. Instead of replaying stored samples (as in traditional rehearsal-based methods), dynamic prompting leverages learnable prompt parameters—often implemented as small, trainable vectors or tokens—that interact with a frozen or largely fixed pre-trained backbone. These prompts provide adaptable, instance- or task-dependent instructions, guiding the model through new learning episodes and mitigating catastrophic forgetting. Recent research in this area demonstrates that prompt-based strategies can rival or even surpass traditional rehearsal-based methods, achieving strong stability-plasticity trade-offs with substantially reduced memory, computation, and privacy burdens.

1. Dynamic Prompting Fundamentals

Dynamic prompting mechanisms center on maintaining a lightweight, trainable prompt memory external to the core model parameters. In typical transformer-based implementations, a pool of prompts $P = \{P_1, P_2, ..., P_M\}$ is constructed, where each prompt $P_j$ is a learnable tensor (e.g., of shape $L_p \times D$ , with $L_p$ being prompt length and $D$ the embedding dimension). For each input $x$ , a subset of prompts is dynamically selected using a similarity-based query-key match (often based on cosine distance between a query computed from $x$ and prompt keys $K = \{K_j\}$ ), and these prompts are concatenated or integrated with the embedded input. This concatenated representation is then processed by the frozen model backbone, allowing the prompts to direct the model’s attention towards task- or instance-relevant knowledge with minimal interference to prior learning (Wang et al., 2021, Wang et al., 2022, Smith et al., 2022).

By externalizing adaptation capacity into prompts—rather than core model weights—the approach sidesteps extensive parameter updates and buffer storage, promoting efficient continual learning suitable for privacy-sensitive or memory-constrained applications.

2. Architectural Strategies and Memory Organization

Several architectural innovations underpin dynamic prompting without rehearsal buffers:

Prompt Pools and Key-Value Memory:

Prompts are paired with learnable keys, forming a key-value memory. At each learning or inference step, the system computes a query vector from the input, conducts a nearest-neighbor search across prompt keys, and retrieves the most relevant prompts to guide the current prediction (Wang et al., 2021, Smith et al., 2022).

Hierarchical and Complementary Prompting:

Advanced frameworks decouple task-invariant from task-specific knowledge. DualPrompt, for instance, separates general prompts (G-Prompt) shared across all tasks from expert prompts (E-Prompt) unique to each task, combining these at different model depths for enhanced plasticity and retention (Wang et al., 2022). Hierarchical prompts further decompose memory into class, task, and general prompts, with separate mechanisms (e.g., Bayesian distribution alignment, cross-task knowledge amalgamation, and self-supervised clustering) for each level (Zuo et al., 21 Jan 2024).

Decomposed and Input-Conditioned Prompts:

CODA-Prompt introduces prompt components assembled dynamically using an end-to-end, attention-based key-query scheme. Each input forms a prompt as a weighted sum over prompt components, with weights (attention scores) determined by interactions between input-conditioned queries and prompt keys, enhancing capacity and adaptability beyond fixed prompt banks (Smith et al., 2022).

Additive Prompt Tuning:

Recent methods such as APT forgo prompt concatenation altogether, directly adding shared prompts to the CLS token’s attention computation, minimizing inference cost and parameter count while maintaining competitive performance (2503.07979).

The table below summarizes select mechanisms:

Method	Prompt Mechanism	Memory Strategy	Prompt Selection
L2P	Prompt pool; key–value	External prompt pool	Query-key nearest neighbor
DualPrompt	G-Prompt and E-Prompt	Split param. for gen./task-specific	Feature-key similarity
CODA-Prompt	Decomposed, end-to-end	Prompt components	Attention-based input matching
Hierarchical	Class, task, general	Multilevel Gaussian	Combination across levels
APT	Additive, shared	Single shared prompt	No retrieval; fused additively

3. Theoretical Justification and Optimization

Dynamic prompts act as succinct memory modules, conferring several theoretical and practical benefits:

Segregation of Task Knowledge:

By encoding each task in distinct prompts or prompt subspaces, interference between tasks is minimized. Instance- or feature-based prompt selection further ensures that prompts reflect the local structure of the data, mitigating overlap and semantic drift.

Flexible Capacity and Regularization:

Prompt pool size, prompt length, and the architecture’s ability to expand the prompt space dynamically (e.g., via component addition in CODA-Prompt) allow adaptive control over memory and model plasticity.

Loss Functions:

Training typically combines a cross-entropy prediction loss with regularization terms. For example, alignment losses enforce closeness of selected prompt keys to input queries, and orthogonality losses or contrastive objectives maintain diversity across prompts and avoid memory collapse (Smith et al., 2022, Zuo et al., 21 Jan 2024). Hierarchical prompt architectures utilize adversarial and contrastive terms to align class, task, and general prompts with real and synthetic data or proxy distributions.

Prompt Selection Algorithms:

While the core procedure is nearest-neighbor matching (with measures such as $d(q, K_j) = 1 - \cos(q, K_j)$ ), some methods introduce more refined selectors—using lightweight neural networks and Gumbel–Softmax sampling for differentiable, instance-dependent choices, or employing attention mechanisms in assembling decomposed prompts (Yang et al., 2023, Smith et al., 2022).

4. Empirical Performance and Benchmark Results

Dynamic prompting without rehearsal buffers has been empirically validated on a wide range of continual learning benchmarks, including Split CIFAR-100, 5-datasets, Split ImageNet-R, CORe50, and others. Across these settings:

For class-incremental and domain-incremental learning:

Methods such as L2P, DualPrompt, CODA-Prompt, CPP, and APT routinely approach or exceed buffer-based baselines in both accuracy and forgetting metrics, even with frozen or minimally tuned backbones. For example, L2P consistently outperforms regularization-based approaches such as EWC and LwF, and CODA-Prompt and DualPrompt report average accuracy gains over buffer methods, especially in regimes with little or no rehearsal (Wang et al., 2021, Wang et al., 2022, Smith et al., 2022, Li et al., 2023, 2503.07979).

Model scalability and parameter efficiency:

Because prompts represent a tiny fraction of total parameters (as little as 0.1%), these methods are efficient both in memory and in communication (noted in federated settings) (Bagwe et al., 2023).

No dependence on test-time task identity:

Instance-driven prompt selection mechanisms obviate the need for explicit task boundary or ID knowledge during inference, allowing seamless adaptation even in task-agnostic or blurred-scenario streams (Wang et al., 2021, Wang et al., 2022).

Generalization across modalities and tasks:

Dynamic prompting frameworks generalize to document retrieval, dialog systems, and other domains where rehearsal is impractical or forbidden (Swamy et al., 2023, Huynh et al., 18 Jun 2024). Performance metrics such as retrieval accuracy and mean reciprocal rank improve noticeably with prompt-based adaptation.

5. Stability, Plasticity, and Model Robustness

A persistent theme in dynamic prompting research is the balance between stability (retaining acquired knowledge) and plasticity (adapting to new data):

Decoupling of adaptation sources:

By updating only prompt parameters and leaving the backbone fixed, the models isolate new knowledge to succinct update paths, drastically reducing interference (Wang et al., 2021, Li et al., 2023).

Mitigation of catastrophic forgetting:

Prompt separation (e.g., hierarchical/class-task-general alignment) and contrastive learning objectives help prevent the overwriting of earlier representations. CPP, for instance, combines prompt tuning with contrastive prototype alignment, reducing both semantic drift and prototype interference, and achieving up to 6% gains over prior methods (Li et al., 2023).

Adaptation to data distribution and task structure:

Methods that condition prompt selection on input features or dynamically assemble prompt components (e.g., via attention or instance-adaptive fusion) demonstrate improved plasticity, rapid adaptation to new tasks, and resilience to task-imbalance phenomena (as in dynamically anchored prompting) (Hong et al., 23 Apr 2024).

6. Practical Considerations, Limitations, and Future Research

Computational efficiency:

Dynamic prompting methods vary in resource demands; prompt-pool querying and sequence extension incur overhead, whereas additive prompt strategies (as in APT) reduce both inference cost and parameter count (2503.07979). Designs that avoid cascading forward passes or restrict updates to the CLS token further improve scalability.

Prompt pool management:

Prompt collapse, suboptimal prompt-key matching, and the need for dynamic prompt pool resizing remain open challenges. Solutions include orthogonality constraints, fixed or topic-aware keys (especially in retrieval contexts (Huynh et al., 18 Jun 2024)), and hierarchical or component-based expansion (Smith et al., 2022).

Task and modality generalization:

Extending dynamic prompting to non-transformer architectures, multi-modal data, or domains with minimal annotator feedback represents a compelling research direction (Wang et al., 2021).

Hybridization with rehearsal/regularization:

While rehearsal-free prompting offers privacy and memory advantages, combining it with small rehearsal buffers, regularizers, or generative replay remains a topic of active investigation, seeking even greater performance or robustness in demanding continual learning settings (Zuo et al., 21 Jan 2024, Smith et al., 2022).

7. Summary and Outlook

Dynamic prompting without rehearsal buffers constitutes a paradigm shift in continual learning: it removes the reliance on stored data, instead leveraging learnable prompts as external, adaptable memory for instruction and retention. Through architectures employing prompt pools, hierarchical prompt organization, decomposed attention, and instance-guided selection, such systems achieve competitive or superior results across diverse continual and incremental learning benchmarks. Effective management of prompts enables robust knowledge update and preservation with minimal parameter overhead. Future work will likely expand dynamic prompting techniques to new model classes, application domains, and adaptivity requirements, advancing the state of privacy-preserving, memory-efficient, and highly flexible lifelong learning systems.