Papers
Topics
Authors
Recent
Search
2000 character limit reached

ContrAstive Prompt Orchestration (CAPO)

Updated 8 February 2026
  • CAPO is a formal learning paradigm that employs contrastive objectives on diverse prompt variants for optimized downstream performance.
  • It utilizes discrete ranking and adaptive aggregation methods, integrating contrastive losses to enhance language model optimization and policy transfer.
  • Empirical results reveal significant gains in prompt quality, safety alignment, and few-shot learning across NLP and embodied AI tasks.

ContrAstive Prompt Orchestration (CAPO) is a formal learning paradigm and algorithmic family for leveraging prompt-level contrast—sometimes with dynamic orchestration mechanisms—to optimize downstream model behavior. Central to CAPO is the use of contrastive objectives over prompt variants, system prompts, or prompt-derived prototypes, and the explicit orchestration of these prompt forms (including dynamic aggregation). The approach is applied in diverse areas including LLM prompt optimization, few-shot learning, safety-aligned generation, unsupervised embedding construction, and visuomotor policy transfer. Key instantiations and theoretical formulations derive from large-scale empirical studies across NLP and embodied AI domains (Lee et al., 2 Sep 2025, Zhang et al., 1 Feb 2026, Zhao et al., 2024, Zeng et al., 2022, Jian et al., 2022).

1. Formal Definition and Framework Variants

CAPO is characterized by the following elements:

  • Contrast Source: Contrasting prompt structures, exemplars, or learned soft prompts with respect to quality, task relevance, or domain specificity.
  • Orchestration Mechanism: Either discrete selection (e.g., ranking and partitioning) or adaptive aggregation (via attention or optimization) over a set of learned prompts.
  • Contrastive Objective: Explicit use of contrastive loss functions (InfoNCE and its variants), supervised or unsupervised, to align model outputs with desired invariances or discriminative boundaries.
  • Application Domain: From discrete prompt optimization for LLMs to learnable prompt pools in vision-language policy learning.

A general CAPO mapping is:

CAPO:(q,P)p,\mathrm{CAPO}: (q, \mathcal{P}) \mapsto p^*,

where qq is a query, P\mathcal{P} a pool of candidate prompts (structured, learned, or retrieved), and pp^* the orchestrated prompt yielding optimal downstream performance under a designated metric.

Discrete retrieval-augmented CAPO (Lee et al., 2 Sep 2025):

  • For LLM prompt optimization, R(q)\mathcal{R}(q) retrieves kk scored prompts pip_i (e.g., from HelpSteer2), partitioned or ranked by metrics M={help,corr,coh,comp,verb}M = \{\text{help}, \text{corr}, \text{coh}, \text{comp}, \text{verb}\}.
  • Contrastive reasoning operators ΦCR\Phi_{\mathrm{CR}} and ΨCR\Psi_{\mathrm{CR}} generate reasoning-augmented instructions for fθf_\theta, leading to a synthesized optimized prompt.

Continuous and adaptive CAPO (Zhang et al., 1 Feb 2026):

  • For cross-embodiment visuomotor policy, a learnable pool P\mathbf{P} of prompts is established, each disentangling a distinct domain factor (e.g., lighting, FOV).
  • Adaptive orchestration Gattn\mathcal{G}_{\mathrm{attn}} dynamically aggregates these via attention-weighted fusion, conditioning prompt mixing on current observations.

2. Retrieval-Augmented and Tiered Contrastive Prompt Optimization

In the automatic prompt optimization setting (Lee et al., 2 Sep 2025), CAPO proceeds via:

  1. Retrieval: Given a query qq, R(q)\mathcal{R}(q) retrieves kk prompts annotated for MM. BM25 scoring for retrieval is used:

scoreBM25(q,p)=tqIDF(t)  tf(t,p)(k1+1)tf(t,p)+k1(1b+bp/avgLen)\mathrm{score}_{\mathrm{BM25}}(q,p) = \sum_{t \in q} \text{IDF}(t)\; \frac{\text{tf}(t,p)(k_1 + 1)}{\text{tf}(t,p) + k_1(1-b + b\,|p|/\text{avgLen})}

  1. Prompt Partitioning and Contrast Formation:
    • Prompts are sorted by average metric score s(pi)\overline{s}(p_i).
    • Disjoint sets are formed: PHP^H (top), PMP^M (middle), PLP^L (bottom).
  2. Contrastive Reasoning Instruction:
    • Inputs to fθf_\theta use templates that reflect on strengths (from PHP^H), weaknesses (from PLP^L), and stable attributes (from PMP^M).
  3. Objective:
    • Margin-based notional objectives guide contrastive distance between embeddings for high- and low-quality prompts, albeit with black-box fθf_\theta.

Alternative "metric-wise" contrast isolates best-per-metric exemplars PmP^m, instructing fθf_\theta to synthesize a composite prompt integrating strengths across all dimensions.

3. Hybrid Contrastive Prompt Pool Learning and Dynamic Orchestration

For cross-embodiment visuomotor adaptation (Zhang et al., 1 Feb 2026), CAPO incorporates:

  • Pool Construction: KK learnable continuous prompts, each trained with a hybrid of:
    • Visual InfoNCE: enforcing invariance to lighting/appearance variations
    • Temporal action-based BYOL: aligning across embodiment/trajectory sequences
    • Text-to-vision alignment: semantic grounding via CLIP-based contrastive loss
  • Adaptive Orchestration Mechanism: Given an observation oto_t, embeddings ztk=Φ(ot,pk)z_t^k = \Phi(o_t, p^k) for each prompt pkp^k are attention-weighted:

αk=exp(skaskc)jexp(sjasjc)\alpha_k = \frac{\exp(s_k^a \cdot s_k^c)}{\sum_j\exp(s_j^a \cdot s_j^c)}

with skas_k^a (learnable MLP score) and skcs_k^c (cosine similarity with unprompted ztvz_t^v).

  • Fused Representation: The final feature is ztf=ztv+ztt+k=1Kαkztkz_t^f = z_t^v + z_t^t + \sum_{k=1}^K \alpha_k\,z_t^k, input to a policy optimized by PPO.

4. Contrastive Orchestration in Safety Alignment and Decoding

In safe LLM alignment, Adversarial Contrastive Decoding (ACD) (Zhao et al., 2024) instantiates a CAPO framework with:

  • Dual Opposite Prompt Optimization:
    • Learning two soft prompts—Safeguarding Prompt (SP) and Adversarial Prompt (AP)—via prompt-tuning on an anchor set distinguishing “refused” vs. “accepted” outputs in harmful/benign instruction cases.
    • Separate losses are applied to reinforce or discourage harmful completions.
  • Contrastive Decoding:
    • At inference, logits under SP and AP are combined by lACD=lSαlAl_{\mathrm{ACD}} = l_S - \alpha l_A, directly subtracting unsafe responses as evidenced by AP.
    • This orchestration consistently boosts harmlessness (HLR) across models by over 20 percentage points, while maintaining performance on regular tasks.
  • Comparison with Other Methods:
    • ACD requires no second model and auto-learns both prompt legs, outperforming non-learned or single-template contrastive approaches.

5. CAPO in Few-Shot and Unsupervised Representation Learning

Few-shot prompt-based learning (Jian et al., 2022):

  • CAPO generates multiple prompt+demonstation "views" per example, differing in template or context.
  • A supervised contrastive loss Lcon\mathcal{L}_{\text{con}} clusters same-class prompt views and repels cross-class ones, supplementing the masked-LM loss.
  • Results show +2–6 percentage points gains in accuracy/F1 over strong prompt-only and retrieval-augmented baselines across 15 tasks.

Unsupervised sentence embedding (Zeng et al., 2022):

  • ConPVP constructs prompt-derived virtual semantic prototypes, with each instance paired to both positive and negative prompt-based sequences.
  • A prototypical InfoNCE loss pulls anchor sentence embedding viv_i towards its positive prototype pi+p_i^+ and away from its negative prototype pip_i^- plus all other batch prototypes.
  • Empirical results show consistent improvements in STS tasks (e.g., +2.6 Spearman’s ρ\rho over SimCSE) and text clustering accuracy.

6. Experimental Results and Empirical Findings

Model Help Corr Coh Comp Verb Avg
GPT-4o Direct 0.366 0.435 0.767 0.405 0.664 0.527
CAPO-Tiered 0.525 0.607 0.882 0.447 0.717 0.636
CAPO-Metric 0.516 0.596 0.876 0.432 0.678 0.620
  • Ablations indicate that omitting contrastive reasoning degrades performance by 8–12%.
  • k=10 retrieval is optimal; larger kk leads to noise dilution.
Approach SR↑ SPL↑ NE↓ EL↓
CURL 52.0±1.9 0.32±0.07 0.48±0.08 32±6
CAPO 97.9±1.2 0.66±0.04 0.02±0.01 18±3
  • Ablations confirm the complementary necessity of visual, temporal-action, and text contrastive objectives.
  • CAPO exhibits superior zero-shot generalization across domains and embodiment changes.
  • HLR (Harmless Rate) is improved from 71.4% (Base) to 92.4% (ACD CAPO), with negligible cost to general win/truthful rates and halved jailbreak attack success rates.

7. Limitations, Open Questions, and Future Directions

Limitations across CAPO studies include:

  • Dependence on annotated prompt corpora (e.g., HelpSteer2) or domain-aligned anchor sets (Lee et al., 2 Sep 2025, Zhao et al., 2024).
  • Generalization to multi-turn dialogue and unannotated domains is largely unexplored.
  • In adaptive orchestration settings, prompt pool size and length trade-offs exist (poor performance with excessive redundancy or overfitting) (Zhang et al., 1 Feb 2026).
  • Retrieval quality (BM25 vs. neural) and model-agnostic orchestration merit further research.

Suggested future work encompasses:

  • Dynamic, user-driven metric weighting in orchestration.
  • Integration of human-in-the-loop for iterative prompt refinement.
  • Orchestration over chain-of-thought or multi-step reasoning traces.
  • Extension of CAPO-inspired approaches to continuous prompt spaces across broader multimodal and multilingual domains.

References

  • "Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization" (Lee et al., 2 Sep 2025)
  • "Learning Adaptive Cross-Embodiment Visuomotor Policy with Contrastive Prompt Orchestration" (Zhang et al., 1 Feb 2026)
  • "Adversarial Contrastive Decoding: Boosting Safety Alignment of LLMs via Opposite Prompt Optimization" (Zhao et al., 2024)
  • "Contrastive Learning with Prompt-derived Virtual Semantic Prototypes for Unsupervised Sentence Embedding" (Zeng et al., 2022)
  • "Contrastive Learning for Prompt-Based Few-Shot Language Learners" (Jian et al., 2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ContrAstive Prompt Orchestration (CAPO).