Cross-task Prompting Overview

Updated 10 December 2025

Cross-task prompting is a framework that transfers knowledge and prompt representations across diverse tasks to improve generalization and parameter efficiency.
It employs mechanisms like prompt initialization and dynamic fusion to adapt effectively in domains such as NLP, vision, reinforcement learning, and graphs.
Empirical results show robust few-shot and continual learning performance, though careful source-task selection is essential to prevent negative transfer.

Cross-task prompting is a paradigm wherein models exploit knowledge, representations, or features gleaned from one or more tasks to enhance generalization, adaptation, or transfer performance in other, potentially novel tasks. This framework spans a wide spectrum of architectures, including LLMs, vision-LLMs, graph neural networks, reinforcement learning agents, and multi-modal translation systems. At its core, cross-task prompting operationalizes the modular, compositional use of prompts—either discrete, soft, or parametric vectors—across heterogeneous task domains, often with the goal of parameter efficiency, transfer robustness, and rapid adaptation in low-resource conditions.

1. Key Formalisms in Cross-Task Prompting

Cross-task prompting methodologies typically instantiate two canonical mechanisms: (i) cross-task prompt initialization/transfer and (ii) prompt composition/fusion.

(i) Prompt Initialization via Source Tasks

Soft prompt vectors tuned for source tasks are repurposed to provide informative starting points for target tasks. For example, Task Prompt Vectors (TPV) define the prompt for a task $t$ as the difference between the fine-tuned prompt and its random initialization,

$\tau_{P_t} = \theta_{P_{\rm ft}}^{(t)} - \theta_{P_{\rm init}}$

which, when summed over source tasks $S$ , forms the combined initialization for a new target,

$\theta_{P_{\rm new}} = \theta_{P_{\rm init}} + \lambda \sum_{i \in S} \tau_{P_{t_i}}$

where $\lambda$ is a scaling factor (Belanec et al., 2 Aug 2024).

Multitask Prompt Tuning (MPT) first distills source-task knowledge into a single shared soft prompt $P^\ast$ , followed by per-task low-rank multiplicative adaptation,

$P_i = (I + u_i v_i^\top) P^\ast$

relying solely on prompt vectors while freezing backbone model weights (Wang et al., 2023).

Bayesian Multi-Task Transfer Prompting (BMTPT) advances initialization by sampling diverse prompt particles from the joint posterior over source tasks using SVGD, thereby preserving both positive and negative task correlations across source datasets

$p(\phi \mid D_{1:K}) \propto p(\phi)\prod_{i=1}^K p(D^i \mid \phi)$

enabling robust adaptation to target tasks (Lee et al., 13 Feb 2024).

(ii) Modular Prompt Composition & Fusion

Arithmetic composition and dynamic fusion mechanisms allow prompt components from different tasks to be additively or multiplicatively integrated. TPV facilitates modular addition,

$\tau_{P_{a+b}} \approx \tau_{P_a} + \tau_{P_b}$

with resulting prompt vectors directly steering the model toward task joint subspaces (Belanec et al., 2 Aug 2024).

Dynamic Prompt Fusion utilizes a pool of $K$ soft prompts with a task-aware scheduling strategy,

$p_t = \sum_{k=1}^K w_{t,k} \cdot P_k$

where $w_{t,k} = \mathrm{softmax}(z_t / \tau)$ and gating with learned task embeddings $\boldsymbol{e}_t$ produces the final prompt as

$p_{\text{final},t} = g_t \odot p_t + (1-g_t) \odot m_t$

achieving flexible sharing and alignment across tasks (Hu et al., 9 Sep 2025).

2. Architectural Manifestations Across Domains

Cross-task prompting has been realized in a variety of model families, each with domain-specific prompt engineering strategies:

Natural Language and Vision-LLMs

In Multitask Vision-Language Prompt Tuning (MVLPT), a universal prompt $P^0$ learned on source tasks dramatically improves few-shot adaptation on diverse target tasks, outperforming single-task baselines using text-side (CoOp), visual-side (VPT), or unified vision-language prompt tuning (UPT) strategies (Shen et al., 2022).
Polyglot Prompting demonstrates monolithic, cross-lingual, cross-task prompting for mT5, converting every (task, language) into a unified English-text template with fixed input fields; this approach significantly improves generalization, especially for zero-shot learning on low-resource languages (Fu et al., 2022).

Graph Neural Networks

MultiGPrompt injects per-layer “pretext tokens” associated with multiple self-supervised pretext tasks (e.g., DGI, GraphCL, link prediction) into the GNN encoder, synergistically combining local and global cross-task knowledge via a dual-prompt architecture consisting of “composed” and “open” prompts (Yu et al., 2023).

Reinforcement Learning

CoTASP leverages over-complete dictionaries to generate sparse, cross-task masks (prompts) that extract sub-networks from a meta-policy, enabling continual adaptation and task sharing while mitigating catastrophic forgetting in sequential task settings (Yang et al., 2023).

MedPrompt employs self-adaptive prompt blocks composed of prompt extraction and fusion components; modality-specific prototype embeddings are dynamically weighted and fused via transformers to guide multi-task medical image translation, achieving state-of-the-art visual quality and generalization (Chen et al., 2023).

3. Optimization Objectives and Training Protocols

Cross-task prompting approaches universally employ parameter-efficient schemes—freezing the backbone and updating only prompt vectors, adapters, or related lightweight modules.

Prompt tuning objective for most models: $\mathcal{L}_{PT} = -\sum_i \log P_{\theta, \theta_P}(Y_i | [P;X_i])$ with parameter updates restricted to prompt vectors or low-rank decompositions.
Bayesian regimes (BMTPT) operate over full posterior distributions, penalizing deviation from source-prompt particles in target prompt adaptation: $J(\phi^T) = -\sum_{(x,y)\in D^T}\log p_{\rm LM}(y|[\phi^T;x]) + \frac{1}{2\sigma^2} \|\phi^T - \bar{\phi}^S\|_F^2$ where $\bar{\phi}^S$ is the mean particle from source tasks (Lee et al., 13 Feb 2024).

4. Empirical Results and Comparative Analyses

Cross-task prompting consistently achieves strong or state-of-the-art results in low-resource, few-shot, and continual learning scenarios, with several studies reporting parity or improvement over full fine-tuning and prior soft-prompt or adapter-based methods:

Methodology	Backbone/Domain	Avg. Accuracy/F1	Parameter Budget
TPV (Belanec et al., 2 Aug 2024)	T5-base (NLU)	+1-2 F1 over SPoT	0.035% (prompt only)
MPT (Wang et al., 2023)	T5-base (NLU)	85.6% (GLUE), 74.1% (SuperGLUE)	0.035% per task
MVLPT (Shen et al., 2022)	CLIP (VL)	+1.7% to +4.7% over single-task	<0.1%
MultiGPrompt (Yu et al., 2023)	GNNs (Web/Graphs)	+2-10% few-shot gains	Prompt vectors only
CoTASP (Yang et al., 2023)	RL Meta-policy	0.92 perf (CW10), 0.88 (CW20)	Sparse masks (no full-tuning)
MedPrompt (Chen et al., 2023)	Transformer U-Net	+2–4 dB PSNR, ↑SSIM	Modality prompts
SPT (Bari et al., 2022)	T5-based LLMs	+1.8–4 pp zero-shot/fine-tune	Hybrid memory/soft prompts
Dynamic Fusion (Hu et al., 9 Sep 2025)	LLMs (CrossFit/GLUE)	82.6 SuperGLUE, 71.3 MMLU	Joint pool/gating
BMTPT (Lee et al., 13 Feb 2024)	T5-base (NLP)	88.7 GLUE (matches FT)	0.035% (prompt only)

Ablation results strongly attribute cross-task gains to modular or compositional prompt fusion, prompt pooling, Bayesian regularization, or source-based initialization. Removal of these components (e.g., dynamic scheduling, gating, memory retrieval) consistently degrades multi-task and transfer performance (Belanec et al., 2 Aug 2024, Hu et al., 9 Sep 2025).

5. Constraints, Limitations, and Failure Modes

Despite strong performance, cross-task prompting exhibits sensitivity to the choice of source tasks, the distributional similarity between source and target, and the parameterization of prompt pools and scheduling mechanisms. Negative transfer occurs when conflicting source tasks skew the shared prompt posterior or when prompt collision arises in shared-pool settings (L2P, HiDe-Prompt). Bayesian and modular designs mitigate but do not eliminate these risks; robustly handling out-of-distribution target tasks or task unbalance remains an open problem (Lee et al., 13 Feb 2024, Le et al., 11 Dec 2024).

Scaling prompt pools, dictionary sizes, and pretext token banks is bounded by computational and optimization constraints; suboptimal tuning or excessive heterogeneity reduces overall gains (Hu et al., 9 Sep 2025, Yu et al., 2023). Cross-lingual transfer further depends on prompt template uniformity and careful language-agnostic engineering (Fu et al., 2022).

6. Practical Engineering Strategies and Future Directions

Best practices for cross-task prompting include:

Initialize prompt vectors from source-task or shared hybrid banks rather than random.
Leverage modular, compositional arithmetic for prompt fusion whenever tasks are correlated.
For continual learning or low-resource settings, employ explicit task-specific prompt pools or particle-based Bayesian priors.
Regularize prompt adaptation to minimize catastrophic forgetting and interference.
Prefer parameter-efficient tuning (soft prompts, adapters, low-rank updates) over full-model fine-tuning.
For multi-modal and cross-domain systems, maintain compact, dynamic prototype embeddings and fuse them via standardized transformer blocks (Chen et al., 2023).

Future research will likely explore automatic source-task selection, task-pool optimization, prompt compression, and cross-task fusion for generative, graph-based, or more diverse multi-modal tasks. Mechanistic interpretability of prompt-space transfer and further robust regularization of modularity/separation remain unsolved challenges (Le et al., 11 Dec 2024, Chatterjee et al., 17 May 2024).

7. Representative Algorithms and Equations

Approach	Core Equation or Constraint	Reference
Task Prompt Vectors	$\tau_{P_t} = \theta_{P_{\rm ft}}^{(t)} - \theta_{P_{\rm init}}$	(Belanec et al., 2 Aug 2024)
Prompt Composition	$p_t = \sum_{k=1}^K w_{t,k} P_k$ (dynamic fusion)	(Hu et al., 9 Sep 2025)
Multitask Shared Adaptation	$P_i = (I + u_i v_i^\top) P^\ast$	(Wang et al., 2023)
Bayesian Posterior	$p(\phi\|D_{1:K}) \propto p(\phi)\prod p(D^i\|\phi)$	(Lee et al., 13 Feb 2024)
Semi-Parametric Retrieval	$p_{\mathrm{SPT}} = \sum_i w_i m_i$	(Bari et al., 2022)

These formal devices provide technical means to achieve parameter-efficient, modular, and compositional cross-task transfer in diverse architectures.

Cross-task prompting is an expansive, evolving field bridging efficient transfer paradigms and modular deep learning, with methodologically varied implementations across language, vision, graph, and RL domains. As deep neural architectures scale, these approaches will underpin both rapid task adaptation and robust zero- or few-shot generalization in complex, multi-task, and continually changing environments.