Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning (2501.03291v1)

Published 6 Jan 2025 in cs.CL

Abstract: Prompt Tuning (PT) enables the adaptation of Pre-trained LLMs (PLMs) to downstream tasks by optimizing a small amount of soft virtual tokens, which are prepended to the input token embeddings. Recently, Decomposed Prompt Tuning (DePT) has demonstrated superior adaptation capabilities by decomposing the soft prompt into a shorter soft prompt and a pair of low-rank matrices. The product of the pair of low-rank matrices is added to the input token embeddings to offset them. Additionally, DePT achieves faster inference compared to PT due to the shorter soft prompt. However, in this paper, we find that the position-based token embedding offsets of DePT restricts its ability to generalize across diverse model inputs, and that the shared embedding offsets across many token embeddings result in sub-optimization. To tackle these issues, we introduce \textbf{A}daptive \textbf{De}composed \textbf{P}rompt \textbf{T}uning (ADePT), which is composed of a short soft prompt and a shallow token-shared feed-forward neural network. ADePT utilizes the token-shared feed-forward neural network to learn the embedding offsets for each token, enabling adaptive embedding offsets that vary according to the model input and better optimization of token embedding offsets. This enables ADePT to achieve superior adaptation performance without requiring more inference time or additional trainable parameters compared to vanilla PT and its variants. In comprehensive experiments across 23 NLP tasks and 4 typical PLMs of different scales, we show that ADePT consistently surpasses the leading parameter-efficient fine-tuning (PEFT) methods, and even outperforms the full fine-tuning baseline in certain scenarios. Code is available at \url{https://github.com/HungerPWAY/ADePT}.

Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

In contemporary NLP, the use of Pre-trained LLMs (PLMs) has become a staple due to their massive parameter sizes, ranging into the billions. While full fine-tuning (FT) these models optimizes them for specific downstream tasks, it is computationally expensive and resource-intensive. This has spurred the development of Parameter-Efficient Fine-Tuning (PEFT) methods that minimize the resource costs associated with adapting PLMs to diverse tasks without substantial performance trade-offs.

ADePT (Adaptive Decomposed Prompt Tuning) introduces a novel approach to prompt tuning that addresses the inefficiencies observed in prior PEFT methods, particularly DePT (Decomposed Prompt Tuning). Traditional DePT works by introducing learnable embedding offsets via a pair of low-rank matrices, augmenting input token embeddings. While effective, the implementation is position-based, potentially leading to sub-optimal ranking and lack of generalization across variant model inputs.

ADePT innovates on this by optimizing a combination of soft prompts and a shallow token-shared feed-forward neural network. The soft prompts, significantly shortened for inference speed benefits, work in tandem with the neural network to compute dynamic, token-specific offsets. This dynamic computation addresses DePT's limitation by providing unique offsets dependent on the token context rather than static positional indices.

In a comprehensive set of experiments on 23 NLP tasks across different scales of PLMs, ADePT consistently outperformed traditional PEFT methods and, notably, even surpassed full fine-tuning in specific scenarios. This performance can be attributed to ADePT's adaptive mechanism that optimizes token embedding spaces based on varied model inputs, achieving superior inference without additional parameter burden.

The implications of ADePT extend beyond immediate efficiency gains. The framework posits a unified schema that can readily integrate with other existing PEFT methodologies. Its token-shared feed-forward neural network presents a paradigm where embedding space formulation is not bound by input token positions but can dynamically adjust based on real-time model input.

Future research trajectories may explore further integration of ADePT within expansive multi-task learning scenarios, where the robustness and flexibility of adaptation to various linguistic tasks can be examined in depth. This approach aligns with the overarching objective of creating generalized, resource-efficient PLMs applicable across diverse NLP applications while mitigating the computational expense generally associated with high-performance tuning.

In conclusion, ADePT represents a significant stride towards achieving parameter-efficient optimization in PLMs by leveraging adaptive prompt tuning facilitated by neural networks. Its contribution suggests new avenues for enhancing PLM adaptability with potential applications in both mainstream and niche NLP domains.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Pengwei Tang (3 papers)
  2. Xiaolin Hu (97 papers)
  3. Yong Liu (721 papers)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com