Direct Prompt Learning: Adaptive Techniques

Updated 10 July 2025

Direct prompt learning is a parameter-efficient method that fine-tunes a few optimized prompt tokens to adapt large, frozen models for new tasks.
It encompasses techniques such as continuous optimization, encoder-based methods, low-rank decomposition, and mixture-of-experts to enhance transfer learning.
Applied in NLP, computer vision, and multimodal tasks, direct prompt learning enables rapid adaptation with minimal computational cost and greater efficiency.

Direct prompt learning is a class of parameter-efficient adaptation techniques for large pre-trained language or multimodal models, in which a small number of continuous or discrete prompt tokens are directly optimized—while the bulk of the model's parameters remain frozen. This approach enables the model to adapt to new tasks or domains by providing carefully constructed prompt inputs, bridging the gap between pre-training and downstream objectives without full-model fine-tuning. Direct prompt learning methods have become central in modern transfer learning research, impacting natural language processing, computer vision, and multi-modal understanding across supervised, few-shot, and zero-shot learning scenarios.

1. Core Principles and Taxonomy

Direct prompt learning methods prepend a learnable “soft prompt” matrix $P \in \mathbb{R}^{m \times d}$ (with $m$ as the prompt length and $d$ as embedding dimension) to the original input tokens $x = [x_1, ..., x_n]$ before passing through the model's embedding function $E(\cdot)$ , resulting in the combined input $[P; E(x)]$ (Li et al., 8 Jul 2025). The main taxonomy of direct prompt learning encompasses:

General Optimization Methods: Direct continuous prompt optimization without additional structure [Prompt Tuning, XPrompt, P-Tuning v2].
Encoder-Based Methods: Insertion of trainable encoders (LSTM, MLP) or reparameterization layers to model dependencies among prompt tokens [P-Tuning, Residual Prompt Tuning, Prefix-Tuning].
Decomposition-Based Methods: Parameter count reduction through matrix decomposition of prompt tokens, often enforcing or exploiting low-rank structures [Decomposed Prompt Tuning, DePT].
Mixture-of-Experts Frameworks: Dynamic selection or mixing among multiple prompt experts for each input, often based on learned routing [SMoP, PT-MoE].

The guiding principle is minimal adaptation—introducing only a small, targeted prompt while “freezing” the underlying model—for efficiency, stability, and rapid deployment.

2. General Optimization and Encoder-Based Approaches

General optimization originated with Prompt Tuning, which directly trains soft prompts, typically initialized randomly or from token embeddings (Li et al., 8 Jul 2025). While extremely parameter-efficient, convergence and performance are sensitive to initialization scheme and optimizer hyperparameters.

Encoder-based approaches expand on this by modeling interactions among prompt tokens, either via LSTMs/MLPs (prompt encoder, [P-Tuning]) or skip/residual connections (Residual Prompt Tuning). Prefix-Tuning extends the concept by prepending learned key-value pairs at every transformer layer instead of only at the input [Prefix-Tuning]. These methods improve prompt flexibility and training stability, albeit at the cost of more trainable parameters.

An illustrative mathematical formulation is: $\text{Soft prompt output} \approx f(P) + P$ where $f$ denotes an MLP, and $P$ is the original prompt embedding matrix (residual formulation) (Li et al., 8 Jul 2025).

3. Decomposition and Mixture-of-Experts Methods

Decomposition strategies reparameterize the prompt matrix to minimize redundancy and parameter count. For example,

$P = A \cdot B$

where $A \in \mathbb{R}^{l \times b}$ and $B \in \mathbb{R}^{b \times d}$ form the low-rank factorization of the original prompt ( $l$ prompt tokens, bottleneck $b \ll d$ ) [Decomposed Prompt Tuning, (Li et al., 8 Jul 2025)]. These techniques are especially beneficial in few-shot learning, yielding competitive performance with reduced memory/computation.

Mixture-of-Experts (MoE) frameworks such as SMoP and PT-MoE posit several short prompt “experts” $P_i$ and use a gating mechanism to select or combine them for each input. Routing can be dynamic and input-dependent, balancing between global prompts (task-level) and instance sensitivity, further reducing overfitting and parameter overhead (Li et al., 8 Jul 2025): $P = X \cdot W \cdot (A \circledast B)$ with $A, B$ as decomposed matrices, $X, W$ as input factors and routing weights, and $\circledast$ denoting mixture logic.

4. Instance-Dependent, Adaptive, and Prototype-Based Prompting

Recent work has emphasized instance-aware and adaptive prompt generation. Instead of a shared prompt per task, models like IDPG and instance-aware prompt learning generate a unique prompt for each sample. IDPG employs a lightweight, input-conditional generator $G(M(x), T)$ , using projection bottlenecks or PHM (Parameterized Hypercomplex Multiplication) layers to keep the parameter count low (Wu et al., 2022, Jin et al., 2022). The prompt for input $x$ is then: $\widehat{p}_j = s_j \cdot p_j$ where $s_j$ is the relevance score for token $j$ .

Prototype-based prompt learning (e.g., PTP (Zhang et al., 2022)) clusters samples in latent space to define $K$ prototypes, each associated with a prompt. The similarity of a query to each prototype weighted the prediction: $\mathrm{Prob}(x, c) = \sum_k \mathrm{sim}(x, \mathcal{P}_k) \cdot \mathrm{Prob}_{\mathcal{T}_k}(x, c)$ This balances per-task and per-instance flexibility while minimizing overfitting.

Adaptive prompt/label mapping is also prevalent. AdaPrompt fuses continual pretraining on prompt-aware retrieved data with automatic verbalizer expansion (leveraging NLI to filter label words), bridging the gap between pre-trained model knowledge and downstream prompt formats (Chen et al., 2022).

5. Black-Box, Reinforcement, and Metaheuristic Optimization

Direct prompt learning in black-box or inaccessible-model settings demands discrete, gradient-free optimization.

BDPL applies a variance-reduced policy gradient method to optimize discrete prompts via API calls, without model gradients (Diao et al., 2022).
RL-optimized prompt generation is also used to steer dialogue models for controllability, using the model’s responses as reward signals; PPO is a common optimizer in these frameworks (Su et al., 2022). This is particularly practical for output-specific control such as emotion or topic.
Metaheuristic prompt learning further generalizes gradient-free search over prompts, using algorithms like hill climbing, simulated annealing, genetic algorithms, tabu search, and harmony search, supporting both white- and black-box scenarios (Pan et al., 2023).

In black-box settings, prompt compression via RL (PCRL) can reduce prompt length by up to 24.6% while maintaining output fidelity, further enhancing efficiency and transferability (Jung et al., 2023).

6. Structured, Robust, and Continual Direct Prompt Learning

Advanced direct prompt learning methods address:

Structure and robustness: MetaPrompter introduces a prompt pool with meta-learning to extract task knowledge and build instance-dependent prompts via attention pooling (Jiang et al., 2023). The associated RepVerb verbalizer maps labels to continuous embeddings directly computed from support set features, improving prediction with no extra parameters.
Diffusion-based and generative prompt refinement: Prompt Diffusion replaces fixed prompts with a generative process that iteratively transforms noise into a sample-adapted prompt using diffusion models, robustness to domain and distribution shifts, and fast ODE-based sampling (Du et al., 26 Oct 2024).
Prompt learning in foundation and multimodal models: For vision or segmentation models (e.g., SAM), prompt optimization across both spatial and semantic embedding spaces with adaptable weighting improves few-shot and domain-specific segmentation (Huang et al., 9 Jan 2024).

Continual learning via direct prompting: Learning to Prompt for Continual Learning (L2P) uses a prompt pool and an instance-wise query mechanism as a lightweight memory, eliminating rehearsal buffers and mitigating catastrophic forgetting (Wang et al., 2021).

7. Applications, Limitations, and Future Directions

Direct prompt learning has demonstrated utility in diverse tasks:

Few-shot and zero-shot classification: Achieving strong accuracy and F1 even with minimal annotated data, and surpassing fine-tuning in low-resource scenarios (Ding et al., 2021, Chen et al., 2022).
Domain-specific adaptation: Effective in clinical NLP, decision support, and specialized domains where resource efficiency and interpretability are paramount (Taylor et al., 2022).
Automated prompt engineering: Sequential optimal learning frameworks leverage Bayesian regression and forward-looking Knowledge-Gradient (KG) policies to select high-quality prompt features under evaluation constraints, scaling to large, constraint-based prompt spaces (Wang et al., 7 Jan 2025).

Challenges include computational efficiency (added memory/latency from soft prompts or multi-prompt mixtures), instability (sensitivity to initialization and learning rates), and, for some methods, diminishing returns with prompt length or in complex tasks (Li et al., 8 Jul 2025). Open avenues involve advanced meta-learning, adaptive and hierarchical prompt generation, deeper theoretical understanding, and generalization to multimodal and continuous lifelong learning contexts.

Table: Representative Direct Prompt Learning Methods

Category	Representative Approaches	Parameterization/Formulation
General Opt.	Prompt Tuning, P-Tuning v2, XPrompt	$[P; E(x)]$ ; soft prompt insertion
Encoder-based	P-Tuning, RPT, Prefix-Tuning	LSTM/MLP encoder, residuals, layerwise prompts
Decomposition	DPT, DePT	$P = A \cdot B$ ; bottleneck/low-rank structure
Mixture-of-Experts	SMoP, PT-MoE	Multiple short prompts, dynamic routing
Instance Adaptive	IDPG, Prototype Prompt, AdaPrompt	Per-input generators, attention/task clusters
Black-box/Discrete	BDPL, PCRL, Metaheuristics	RL, policy gradient, metaheuristic searching

Conclusion

Direct prompt learning constitutes a versatile and rapidly evolving paradigm for efficient adaptation of large pretrained models. Its spectrum of methods—ranging from simple soft prompt optimization to advanced adaptive, robust, and black-box frameworks—enables practical solutions for supervised, zero/few-shot, and continual learning tasks. Ongoing research aims to enhance training stability, interpretability, and extensibility to new domains and modalities, solidifying direct prompt learning as a foundational technique in modern AI systems (Li et al., 8 Jul 2025, Jin et al., 2022, Zhang et al., 2022, Jiang et al., 2023, Du et al., 26 Oct 2024).