PromptFuseNL: Unified Few-Shot Adaptation

Updated 25 November 2025

PromptFuseNL is a unified framework that integrates predictive prompt tuning, dual-branch learning, and unsupervised reweighting for both vision-language and language-only scenarios.
It utilizes a frozen CLIP backbone with lightweight adaptations to refine class prototypes efficiently and enhance generalization even under label noise.
The framework delivers state-of-the-art few-shot accuracy and superior training efficiency, supporting multi-modal adaptation and zero-shot prompt discovery.

PromptFuseNL is a unified framework for robust few-shot adaptation and cross-model prompt optimization in both vision-language and language-only scenarios. It combines predictive prompt tuning, dual-branch positive and negative learning, unsupervised instance reweighting, and zero-shot adapter mechanisms, enabling efficient and accurate generalization even under noisy support data and mismatched tokenizers. The approach yields state-of-the-art few-shot accuracy and training efficiency, supporting multi-modal adaptation, prompt injection fuzzing, and zero-shot prompt discovery across diverse architectures (Mandalika, 16 May 2025, Yu et al., 2024, Williams et al., 2024).

1. Architectural Foundation and Task-Conditioned Residuals

PromptFuseNL builds upon a frozen CLIP backbone featuring a visual encoder $f_v$ and text encoder $f_t$ , augmented with lightweight textual and visual branches. The central construct is the refinement of class prototypes through learned, task-conditioned residuals: $t_c' = t_c + \Delta_\theta^t(t_c), \qquad v_c' = v_c + \Delta_\theta^v(v_c),$ where $t_c = f_t(\text{“class name }c\text{”}) \in \mathbb R^d$ and $v_c = f_v(\text{template }x_c) \in \mathbb R^d$ . Visual features are further transformed as

$\tilde v_c = W_v\bigl(\mathrm{LayerNorm}(v_c')\bigr)$

using a small per-class linear projection $W_v$ . Textual and visual prototypes are finally fused: $z^+_c = \lambda\,( \tilde t_c + \tilde v_c ) + (1-\lambda)\,(t_c + v_c), \qquad \lambda\in[0,1].$ This structure allows discriminative adaptation per episode while maintaining backbone parameter efficiency ( $<$ 0.1% overhead) (Mandalika, 16 May 2025).

2. Dual-Branch Losses and Semantic Negative Mining

PromptFuseNL introduces a dual-objective training regime:

Positive alignment: Queries $q$ are attracted to their correct class prototype $z^+_y$ using a cosine classifier with temperature scaling:

$\mathcal L_{\rm pos} = -\log \frac{\exp(\cos(q, z^+_y)/\tau)}{\sum_{c=1}^N \exp(\cos(q, z^+_c)/\tau)}$

Negative repulsion: A hard negative mining procedure selects the $K$ most confusable prototypes based on similarity to the mean support embedding $\bar s$ :

$\mathcal N = \mathrm{TopK}\left\{c \notin \text{support} \mid \cos(\bar s,z_c)\right\}$

Each $n\in\mathcal N$ is processed analogously, and a hinge loss is imposed:

$\mathcal L_{\rm neg} = \frac{1}{|\mathcal N|} \sum_{n\in\mathcal N} \max(0,\, \tau - \cos(q,\,z^-_n)).$

The final classification loss combines positive, negative, and L2 regularization for attention parameters: $\mathcal L = \mathcal L_{\rm pos} + \mathcal L_{\rm neg} + \gamma\|\theta_{\rm attn}\|_2^2$ This arrangement substantially enhances fine-grained class separation and out-of-domain generalization (Mandalika, 16 May 2025).

PromptFuseNL coordinates information in four cascading stages:

Predictive Prompt Tuning: Compute attention logits with a small MLP $f_\phi$ :

$\tilde\alpha_c = f_\phi(t_c), \qquad \alpha_{c,i} = \frac{\exp(\tilde\alpha_{c,i})}{\sum_{j=1}^S \exp(\tilde\alpha_{c,j})}$

yielding a prompt token $p_c=\sum_i \alpha_{c,i}s_i$ , modifying $t_c'$ .

Cross-Modal Attention: Refine text prototype by cross-attention over support visuals, producing $\hat t_c$ via $\mathrm{CrossAttn}(t_c', V, V)$ .
Visual Prototype Adaptation: Weight support examples using $w_i$ (instance reweighting), average and add residual $r_c$ , then project via $W_v$ .
Late Fusion: Fuse refined textual and visual prototypes into $z^+_c$ .

This stratified coordination maximizes discriminative fusion and adaptation, leveraging both contextual and episodic information (Mandalika, 16 May 2025).

4. Unsupervised Instance Reweighting and Label Noise Robustness

To address label noise and outlier contamination in support sets, PromptFuseNL assigns each example a soft reliability score: $w_i = \tfrac12\left[\cos(x_i, \bar s) + \cos(x_i, z^+_{y_i})\right]$ where $\bar s$ is the mean visual embedding and $z^+_{y_i}$ is the adapted prototype. Instances receiving higher scores contribute more to prototype construction, while unreliable or mislabeled examples are suppressed. This strategy removes the need for auxiliary labels or explicit structural modifications, and empirically delivers $+0.3$ to $+0.9$ points in accuracy under up to 50% support label corruption (Mandalika, 16 May 2025).

5. Cross-Tokenizer Prompt Discovery via FUSE

PromptFuseNL incorporates FUSE (Flexible Unification of Semantic Embeddings) to support zero-shot prompt optimization across models with mismatched tokenizers and embedding spaces (Williams et al., 2024). This is accomplished by representing each model's vocabulary as a third-order tensor $\tilde V \in \mathbb{R}^{|W|\times \ell \times d}$ , where $|W|$ is word vocabulary size, $\ell$ is sub-token count, and $d$ is embedding dimension.

Adapter computation proceeds as follows:

For each word-length $\ell$ , compute tensor pseudo-inverse $V_i^+$ and adapter map $M[\ell]=V_i^+ * V_j$ .
The forward pass for prompt optimization swaps embeddings across models with:

$\tilde E_j \approx \tilde E_i * (V_i^+ * V_j)$

Backpropagation supports prompt search/editing by transferring gradients:

$\nabla_{E_i} L_j \approx \text{merge}\left((V_i^+ * V_j) * \text{split}(\nabla_{E_j} L_j) \right)$

A PromptFuseNL pipeline can thus utilize any two models $A$ (generation) and $B$ (evaluator) by precomputing the word-adapter tensors, initializing prompt beams, quantifying joint loss, and propagating $B$ 's gradients for discrete prompt search—all with fixed reference adapters, requiring no model retraining or fine-tuning (Williams et al., 2024).

6. Benchmark Results and Efficiency Profile

Across 15 major few-shot vision-language benchmarks and several domain generalization tasks, PromptFuseNL demonstrates superior accuracy and resource efficiency (Mandalika, 16 May 2025):

Method	1-shot	2-shot	4-shot	8-shot	16-shot
SimNL (prior SOTA)	67.5%	70.1%	72.4%	75.1%	77.8%
PromptFuseNL	74.3%	78.6%	81.5%	85.1%	88.8%

PromptFuseNL achieves up to 300× faster training (episodes/sec) and 1000× lower compute per episode versus full prompt tuning, facilitated by its low-overhead modules and frozen backbone. Domain generalization (ImageNet $\rightarrow$ V2/Sketch/A/R) shows 50.8% mean accuracy (vs. 45.3% for SimNL), and robustness to substantial label noise is observed without architecture modification or explicit regularization. These results substantiate the framework's scalability and generalization capacity.

7. Applications, Limitations, and Future Directions

PromptFuseNL supports broad application profiles: robust cross-modal few-shot learning, adversarial injection fuzzing, efficient prompt discovery across model/tokenizer boundaries, and scalable adaptation for deployed and research settings (Mandalika, 16 May 2025, Yu et al., 2024, Williams et al., 2024).

Limitations include reliance on frozen backbones, sensitivity to vocabulary selection in FUSE, and the approximation inherent in tensor-based adapters. Expected future work includes:

Expansion to dialogue-level and multi-turn prompt morphisms
Incorporation of “web injection” scenarios via external content retrieval
Iterative red-teaming and fine-tuning loops for adversarial robustness
Transfer of prompt optimization across non-English and highly morphologically diverse languages

This suggests PromptFuseNL can serve as a foundational prompt optimization and adaptation platform for heterogeneous, multi-model, and adversarially resistant learning environments.

PDF Markdown Chat (Pro)

References (3)

Generalizable Vision-Language Few-Shot Adaptation with Predictive Prompts and Negative Learning (2025)

PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs (2024)

FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to PromptFuseNL.

PromptFuseNL: Unified Few-Shot Adaptation

1. Architectural Foundation and Task-Conditioned Residuals

2. Dual-Branch Losses and Semantic Negative Mining

4. Unsupervised Instance Reweighting and Label Noise Robustness

5. Cross-Tokenizer Prompt Discovery via FUSE

6. Benchmark Results and Efficiency Profile

7. Applications, Limitations, and Future Directions

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

PromptFuseNL: Unified Few-Shot Adaptation

1. Architectural Foundation and Task-Conditioned Residuals

2. Dual-Branch Losses and Semantic Negative Mining

3. Multi-Stage Cross-Modal Coordination

4. Unsupervised Instance Reweighting and Label Noise Robustness

5. Cross-Tokenizer Prompt Discovery via FUSE

6. Benchmark Results and Efficiency Profile

7. Applications, Limitations, and Future Directions

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research