Continual Prompting Module (CPM)

Updated 7 May 2026

Continual Prompting Module (CPM) is a method using small, learnable prompt tokens inserted in a frozen pre-trained transformer to enable task-specific adaptation.
CPM architectures employ hierarchical, dual, and domain-specific prompt strategies to structure and route information with minimal computational overhead.
Empirical studies show CPM reduces catastrophic forgetting and attains high accuracy across image, video, and cross-modal tasks without rehearsal techniques.

Continual Prompting Module (CPM) refers to a class of methods in continual learning that leverage small, learnable prompt parameters interfacing with a pre-trained, frozen backbone (typically Transformer-based). CPM architectures structure and manage prompts in ways that facilitate efficient adaptation to sequential tasks, mitigate catastrophic forgetting, and enforce modularity with minimal computational and memory overhead. This design paradigm is foundational for state-of-the-art rehearsal-free continual learning, domain-incremental, and cross-modal adaptation systems.

1. Conceptual Foundations and Definition

A Continual Prompting Module is a lightweight parameterization—typically consisting of vectors or matrices (prompt tokens)—inserted at designated points in a frozen pre-trained backbone. These prompts function as task, domain, or modality-specific instructions, allowing the backbone to extract relevant information for each learning scenario without updating the core model weights. The CPM approach addresses the stability–plasticity dilemma by confining adaptation to prompts while preserving global representations learned during pre-training.

The module's primary components are:

Per-task or per-domain prompts: Dedicated learnable parameters for each observed task/domain (e.g., $P_t$ for task $t$ ).
Global/common prompts: Shared parameters capturing task-invariant or cross-task/domain knowledge (e.g., $P^g$ , $G$ , or $P_C$ ).
Prompt selection/routing scheme: Mechanisms for identifying and attaching the correct prompt subset to the model during inference, such as via keys, feature similarity, or clustering.

This approach underpins various architectures, including hierarchical (grouped) prompt generators (Jiang et al., 15 Nov 2025), dual/complementary prompt systems (Wang et al., 2022), domain-incremental twin-prompt schemes (Feng et al., 2024), and extensions to multimodal and video domains (Villa et al., 2022, Guo et al., 1 Mar 2025).

2. Architectures and Variants

Hierarchical and Layer-Grouped CPMs

Modern CPMs introduce structure above the naive "per-layer, per-task prompt" design. In "Teaching Prompts to Coordinate" (Jiang et al., 15 Nov 2025), prompts are parameterized hierarchically:

A root prompt $p^0_t$ generates group-wise prompts $p^g_t$ via a small adapter network $f_\text{gen}$ , conditioned on group indices with positional encodings.
Within each group, sub-prompts for each layer are computed as $p_t^{g,\ell} = p_t^g + \beta_t^\ell$ , where $\beta_t^\ell$ are small positional incentive embeddings.

This organization preserves global feature propagation pathways and restricts over-flexible per-layer adaptation, directly mitigating the risk of catastrophic forgetting.

Dual/Complementary Prompts

In DualPrompt (Wang et al., 2022), CPM consists of:

A task-invariant G-Prompt ( $t$ 0) attached globally across tasks,
Task-specific E-Prompts ( $t$ 1) with associated keys $t$ 2 guiding routing.

This architecture enables explicit separation between shared and private information, with independent update and freezing mechanisms for global and episodic prompts.

Spatial–Temporal and Multimodal Extensions

PIVOT (Villa et al., 2022) extends CPMs to video continual learning:

Spatial prompts $t$ 3 are injected at image embedding stages.
Temporal prompts $t$ 4 are injected in a lightweight transformer aggregating temporal features.
A multi-modal key mechanism leverages CLIP text-encoder embeddings for task selection.

For missing-modality continual adaptation, (Guo et al., 1 Mar 2025) employs:

Modality-specific, Task-aware, and Task-specific prompts, hierarchically applied across transformer layers and regulated with cross-modal contrastive objectives.

Domain-Incremental and Composition-Based CPMs

CP-Prompt (Feng et al., 2024) introduces a twin-prompt CPM:

Common prompts ( $t$ 5), prepended and continuously updated through domains, act as inter-domain knowledge carriers.
Personalized prompts (per-layer, per-domain) specialize model behavior for each domain.

A clustering-based domain selector ensures the selection and application of the correct prompt parameters during inference.

3. Optimization and Training Protocols

Continual Prompting Modules are typically trained in a staged or joint manner per task/domain:

Prompt Initialization & Freezing: Upon arrival of a new task/domain, a prompt (or root prompt in hierarchical schemes) is initialized. Past prompts and prompt-generation parameters are frozen to prevent interference (Jiang et al., 15 Nov 2025, Wang et al., 2022, Hu et al., 2023).
Prompt Generation & Insertion: Prompts are injected via prefix-tuning or key/value stream concatenation at designated layers. For group-wise schemes, a root prompt is mapped to group and sub-group prompts by adapters (Jiang et al., 15 Nov 2025).
Supervised and/or Contrastive Learning: Task loss (cross-entropy for classification/regression) is combined with (where applicable) prompt matching loss, task-interaction contrastive loss (e.g., NT-Xent), and regularizations targeting classifier/prompt consistency (Gao et al., 2024, Guo et al., 1 Mar 2025, Villa et al., 2022).
Prompt Freezing & Memory Management: Once trained for a given task/domain, the corresponding prompt (and possibly generator snapshot) is frozen and stored for later retrieval. Episodic memory or coreset strategies may supplement functions such as replay or anomaly discrimination (Hu et al., 2023, Villa et al., 2022).

Table: Selected Prompt Freezing and Allocation Strategies

Method	Prompt Allocation	Freezing Policy
DualPrompt (Wang et al., 2022)	G-prompt (shared), E-prompts (per-task)	Freeze E-prompts post-task
Hierarchical CPM (Jiang et al., 15 Nov 2025)	Root per task, group/layer sub-prompts	Freeze root and adapters
CP-Prompt (Feng et al., 2024)	Common (global), Personalized (per-domain/layer)	Freeze both per domain
POP (Hu et al., 2023)	Per-task, global POP	Freeze per-task after learning

4. Inference and Prompt Selection Mechanisms

CPMs rely on explicit or implicit routing to choose which prompts to attach during inference. Main strategies include:

Key-based or Class-token similarity: Use the frozen model's output (e.g., class token embedding) and stored keys (from prompt pools or class-name embeddings) to select the prompt yielding maximum similarity (Wang et al., 2022, Villa et al., 2022).
Soft matching/fusion: Compute soft weights over all stored prompts (or their root representations), then combine them as a task-fused prompt (Jiang et al., 15 Nov 2025).
Clustering-based routing: Apply a lightweight K-means on deep features to assign the input to the nearest domain, then retrieve the corresponding prompt set (Feng et al., 2024).
Multi-key mechanisms: In multi-class/continual settings, store per-class keys and maximize cosine similarity or apply softmax cross-entropy over all (task, class) key pairs (Gao et al., 2024).

Prompt consistency training (random prompt switches during training) may be used to increase prompt selector robustness to misassignments (Gao et al., 2024).

5. Empirical Performance and Forgetting Mitigation

CPM-based architectures establish new benchmarks across various scenarios:

Hierarchical grouping (CPM (Jiang et al., 15 Nov 2025)) achieves 97.6% FAA on CIFAR-100 and 82.7% on 20-task ImageNet-R, with low average forgetting (AF=0.58% with PIE).
Video continual learning (PIVOT (Villa et al., 2022)) delivers a 27% improvement (absolute acc. from 43.3% to 73.8%) over iCaRL on ActivityNet (20 tasks), with BWF (backward forgetting) reduced to 3–4%.
Rehearsal-free image CL (DualPrompt (Wang et al., 2022), CPrompt (Gao et al., 2024)) attains high average accuracy (≥86.5% on CIFAR-100, 68.13% ImageNet-R), outperforming buffer-based and prior prompt-tuning baselines.
Domain-incremental learning (CP-Prompt (Feng et al., 2024)) secures near-zero forgetting (AF=0.25%) and surpasses prior prompt-only schemes by 1–3pp in accuracy.
Anomaly detection (UCAD (Liu et al., 2024)) leverages CPM as an append-only memory system with per-task key-prompt-knowledge triplets, yielding improved segmentations without catastrophic forgetting.

Forgetting is controlled by (i) freezing all non-current prompts, (ii) sharing information via root/common/global prompts, (iii) limiting the adapted parameter count (typically <1% of backbone), and (iv) explicit consistency regularization in some formulations (Gao et al., 2024). Replay buffer or coreset strategies can be integrated where needed (e.g., PIVOT, UCAD).

6. Limitations and Future Directions

CPM-based continual learning exhibits several challenges and open questions:

Routing/selection overhead: Soft task matching or clustering can elevate inference latency (Jiang et al., 15 Nov 2025, Feng et al., 2024).
Prompt allocation granularity: Overly flexible per-layer prompting may increase forgetting, while excessive sharing may undermodulate necessary adaptation (Jiang et al., 15 Nov 2025).
Fixed groupings and prompt scaling: Group and prompt configuration are often fixed a priori; learning optimal structures or sharing schemas is an open area (Jiang et al., 15 Nov 2025).
Generality beyond vision: While CPMs dominate in ViT family models, extensions to LLMs, multi-modal architectures, and non-transformer structures are under exploration (Villa et al., 2022, Guo et al., 1 Mar 2025).
Domain coverage in foundation models: Robustness to domain shifts and pre-training overlap remains imperfect; evaluating true generalization is increasingly difficult as foundation model coverage expands (Hu et al., 2023).

Plausible future research directions include learning prompt/group hierarchies, advancing cross-modal CPMs, integrating CPMs with memory-augmented architectures, and optimizing soft routing and fusion.

7. Summary Table: CPM Design Axes Across Representative Methods

Method / Ref	Prompt Topology	Prompt Freezing	Routing Mechanism	Key Empirical Results
Hierarchical CPM (Jiang et al., 15 Nov 2025)	Root→Group→Layer	Freeze per-task root/adapter	Soft match (feature fusion)	FAA=97.6% (CIFAR-100, 20×5), AF=0.58%
DualPrompt (Wang et al., 2022)	G+E (Global/Episodic)	Freeze G/old E	Cosine-to-key	86.51%/5.16% Fgt (CIFAR-100)
CP-Prompt (Feng et al., 2024)	Common + Per-domain	Freeze per-domain	K-means feature clustering	AA=93.65%, AF=0.25% (CDDB-Hard)
PIVOT (Villa et al., 2022)	Spatial+Temporal	Freeze old prompts	CLIP text encoder keys	+27% (ActivityNet 20-task)
POP (Hu et al., 2023)	Per-task + Global	Freeze per-task	Prompt bank, no routing	AA=85.8% (CIFAR-100, buffer=5k)

The Continual Prompting Module paradigm, across its variants, constitutes a principal methodology for efficient, modular, and robust continual learning in modern pre-trained (especially transformer) architectures, delivering high efficiency and stability across image, video, cross-modal, and domain-incremental scenarios (Jiang et al., 15 Nov 2025, Feng et al., 2024, Villa et al., 2022, Hu et al., 2023, Wang et al., 2022, Guo et al., 1 Mar 2025, Gao et al., 2024, Liu et al., 2024).