Papers
Topics
Authors
Recent
2000 character limit reached

CoPS: Conditional Prompt Synthesis

Updated 18 December 2025
  • Conditional Prompt Synthesis (CoPS) is a family of methods that dynamically generates instance-specific prompts to guide frozen deep models using minimal trainable parameters.
  • It utilizes techniques such as expert pooling, soft routing, and attention mechanisms to adaptively merge input features and auxiliary modalities for improved task performance.
  • Empirical studies show that CoPS achieves robust transfer learning results, balancing parameter efficiency with significant accuracy gains in multi-modal and zero-/few-shot scenarios.

Conditional Prompt Synthesis (CoPS) refers to a family of parameter-efficient approaches for synthesizing adaptive, input- or context-conditioned prompts that steer large, typically frozen, deep models (such as LLMs, vision models, or multimodal transformers) toward instance- or task-specific behavior. CoPS exploits conditionality—via learnable functions, routing mechanisms, or attention over expert pools—to move beyond static prompt templates and achieve strong performance and generalization with minimal trainable parameters, particularly in transfer (zero- and few-shot), multi-modal, and compositional settings.

1. Definitions and Conceptual Foundations

Conditional Prompt Synthesis is formally characterized by the goal of learning a prompt-synthesizing function p(x)p(x) (with trainable parameters) that maps instance-specific information or task metadata xx into a continuous prompt, which is then injected into a frozen, pre-trained backbone model θ\theta to adapt its predictions. The core feature distinguishing CoPS from vanilla prompting is that the prompts are not fixed or purely learned per class/task, but instead dynamically derived from input or side information—potentially including auxiliary modalities, semantic priors, learned expert pools, or structured rule systems.

Variants of CoPS exist in multiple domains:

Common architectural design principles of CoPS include:

  • Pooling a set of learnable "prompt experts," which are compositely mixed via softgating or routing;
  • Using lightweight selector/routing networks (MLP, attention, production system modules) that output instance-dependent mixture weights;
  • Integration of the conditional prompt into the model input, attention layers, or feature stages to bias computation toward relevant semantics.

2. Architectural Methodologies

2.1 Expert Pools and Soft Routing

A prevalent CoPS formulation employs a pool of ll learnable prompt experts P={P1,...,Pl}\mathcal{P} = \{P_1, ..., P_l\}, each PiRK×DP_i \in \mathbb{R}^{K \times D}, where KK is the prompt token count and DD the backbone hidden dimension. For a given input XX, a selector network synthesizes a weighted mixture:

α=softmax(W2σ(W1g(X))+b2),Pcond=i=1lαiPi\alpha = \text{softmax}(W_2\,\sigma(W_1\,g(X)) + b_2), \qquad P_{\text{cond}} = \sum_{i=1}^l \alpha_i\,P_i

Here, g(X)g(X) is an input embedding extractor (e.g., pooled features), and W1,W2W_1,W_2 are learned (Wang et al., 2023). This prompt is then concatenated or injected into the main model at the input or early transformer/CNN stages.

2.2 Mixture of Prompt Experts (MoPE)

Conditional Prompt Tuning for multimodal fusion (Jiang et al., 2023) proposes an extension where, for each instance and each transformer layer ii, a dynamic prompt is constructed as a soft mixture of prompt experts, with soft routing scores r(i)(ψy)r^{(i)}(\psi_y) (dependent on a prior modality):

Pd(i)(ψy)=j=1krj(i)(ψy)  Ei,j\mathbf{P}_d^{(i)}(\psi_y) = \sum_{j=1}^k r^{(i)}_j(\psi_y)\;\mathbf{E}_{i,j}

To balance expert utilization, an "importance loss" regularizes the distribution of routing weights across a mini-batch, incentivizing balanced expert usage and preventing degeneracy.

2.3 Production-System Modules

In language modeling, CoPS can be realized via differentiable production systems, as in PRopS (Pilault et al., 2023), where a set of rule modules (attention or MLP blocks) are selected and composed according to the input/task condition. A Gumbel-softmax-based selector induces sparsity in module choice:

p(x)=iSk(x)αifi(E(x))p(x) = \sum_{i \in S_k(x)} \alpha_i f_i(E(x))

where E(x)E(x) embeds the input/task, fif_i are the rule modules, and Sk(x)S_k(x) is the selected set of k modules.

2.4 Cross-Modal Conditional Prompting

Recent VLM research (Yang et al., 11 Jul 2025) synthesizes both text and visual conditional prompts via mutual guidance. Semantic prompts are extracted with a multi-modal LLM (MLLM) using attention over the MLLM’s decoder cache, followed by adaptation into VLM space. Visual prompts are then constructed by mutually guiding visual and semantic features through self- and cross-attention (AMG module).

2.5 Prototype and Semantic Token Enhancement

For zero-shot anomaly detection, CoPS (Chen et al., 5 Aug 2025) leverages explicit state prototypes (extracted via cross-attention over patch features) and implicit semantic class tokens (sampled via VAE from global image features) to assemble context-rich, state-aware prompts. A spatially-aware alignment module further refines prompt effectiveness for both image-level and pixel-level detection.

3. Training Objectives and Regularization

These methods generally follow “frozen backbone, trainable prompt” paradigms, optimizing the conditional prompt parameters along with the selector/routing modules via a standard downstream task loss, often cross-entropy or binary cross-entropy, with added regularization for balanced expert usage, prompt diversity, or compositional sparsity.

For instance, (Jiang et al., 2023) minimizes:

L=Ltask+λiLimp(i)\mathcal{L} = \mathcal{L}_{\mathrm{task}} + \lambda\sum_{i}\mathcal{L}_{\mathrm{imp}}^{(i)}

while (Chen et al., 5 Aug 2025) optimizes joint modular losses for state prototype alignment, variational class sampling, and spatial text-image alignment:

minθ,ψ,ω,φLESTS(θ)+LICTS(ψ)+LSAGA(ψ,ω,φ)\min_{\theta,\psi,\omega,\varphi} \mathcal{L}_{\rm ESTS}(\theta) + \mathcal{L}_{\rm ICTS}(\psi) + \mathcal{L}_{\rm SAGA}(\psi,\omega,\varphi)

Contrastive learning objectives are common for multimodal settings, with additional regularizers enforcing consistent prompt usage or feature alignment with augmentations.

4. Applications and Empirical Impact

Conditional Prompt Synthesis methods have been validated across a diverse range of applications:

Domain CoPS Implementation (Source) Representative Gains
Video Action Recognition Soft Conditional Prompt Learning (SCP) (Wang et al., 2023) +3.17–10.2% accuracy on Okutama, NECDrone, SSV2
Multimodal Fusion MoPE-based conditional tuning (Jiang et al., 2023) SOTA with 0.7% params, matches or exceeds fine-tuning
Zero-shot Anomaly Detection CoPS (Chen et al., 5 Aug 2025) +2.5 pp AUROC vs. prior SOTA (92.5% vs. 90.0%)
Vision-LLMs MuGCP (Yang et al., 11 Jul 2025) +2.01% (few-shot HM metric) vs. previous best
LLM Adaptation PRopS (Pilault et al., 2023) +15.5% compositional EM accuracy over baseline

In all settings, CoPS techniques consistently outperform non-conditional prompt methods or static prompt baselines—particularly in settings with limited data, task composition, or where parameter efficiency is critical.

5. Comparative Analysis and Ablations

Empirical analyses reveal several robust properties of CoPS designs:

  • Expressivity: Instance-conditional prompt synthesis, especially via expert pooling or mixture modules, scales more effectively than simply enlarging prompt length (Jiang et al., 2023).
  • Generalization: Compositional/gated prompt systems (e.g., PRopS) enable zero- and few-shot transfer by reusing learned “subprompts” for novel input combinations, yielding sample-efficient generalization (Pilault et al., 2023).
  • Parameter Efficiency: Across benchmarks, CoPS approaches achieve high accuracy with 1–10% (often <1%) of the trainable parameters required for full fine-tuning or adapter-based transfer.
  • Ablations: Removing dynamic routing, regularization, or mutual-attention modules causes significant drops in performance and generalization (e.g., importance loss prevents expert collapse, full AMG and multi-prompt fusion boost generalization in MuGCP (Yang et al., 11 Jul 2025)).
  • Prompt Diversity: Balanced utilization of prompt experts (encouraged via importance loss or similar terms) prevents routings from collapsing onto a few experts and improves robustness to data scaling (Jiang et al., 2023).

6. Limitations and Open Directions

Known constraints of CoPS methodologies include:

  • Selector Design Sensitivity: Effectiveness depends on the capacity, architecture, and regularization of the selector/router. Overly simplistic routings cannot capture complex input variability; poorly regularized selectors collapse onto a small subset of experts.
  • Prompt Interpretation: Learned prompt experts or composed modules do not always correspond to semantically interpretable factors or tasks; understanding prompt semantics remains an open problem (Pilault et al., 2023).
  • Resource Overhead: Some advanced schemes (e.g., MuGCP (Yang et al., 11 Jul 2025)) require substantial compute and memory due to reliance on MLLM decoders, offline caching, or multiple attention modules.
  • Domain/Task Portability: Optimal pool size, token counts, and fusion topology are task-dependent and may require extensive ablation.
  • Noise and Overfitting: External priors (such as MLLM-generated semantic embeddings) can encode irrelevant or spurious context, necessitating future research into content filtering and adaptive knowledge distillation (Yang et al., 11 Jul 2025).

Directions for future exploration include: lightweight distillation of prompt knowledge, dynamic gating, efficient memory management for prompt caches, and extension of CoPS principles to detection, segmentation, or video-language alignment (Yang et al., 11 Jul 2025).

7. Theoretical Properties and Interpretability

Initial theoretical evidence (e.g., (Pilault et al., 2023)) suggests that CoPS-style modular prompt systems retain favorable sample complexity compared to monolithic prompt learning, provided module selection is sparse and compositionally structured. Proposition 1 in (Pilault et al., 2023) formalizes that, under compositional reuse and sufficient expressivity, risk can be made arbitrarily close to the Bayes risk with polynomially fewer samples relative to the number of modules and composed subtasks. This suggests that prompt libraries can span large task spaces while preserving compactness and adaptation speed. Moreover, gating scores or module activations in these systems provide an interpretable basis for analyzing input-to-prompt mappings, though the alignment to human-interpretable subtasks is not always guaranteed.

References

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Conditional Prompt Synthesis (CoPS).