DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning (2309.05173v5)

Published 11 Sep 2023 in cs.CL, cs.AI, cs.CV, and cs.LG

Abstract: Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the input of LLMs (LM), has shown promising results across various tasks and models for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces additional soft prompt tokens, leading to longer input sequences, which significantly impacts training and inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for LLMs that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. This allows DePT to achieve better performance while saving substantial memory and time costs compared to vanilla PT and its variants, without changing trainable parameter sizes. Through extensive experiments on 23 NLP and vision-language (VL) tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline, in some scenarios. Additionally, we empirically show that DEPT grows more efficient as the model size increases. Our further study reveals that DePT integrates seamlessly with parameter-efficient transfer learning in the few-shot learning setting and highlights its adaptability to various model architectures and sizes.

PDF Abstract

DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

The contemporary landscape of NLP and Vision-Language (VL) tasks has seen significant advances in the adaptability and efficiency of LLMs. As an extension of this progress, the Decomposed Prompt Tuning (DePT) framework introduces a novel approach to parameter-efficient fine-tuning (PEFT). This paper critically evaluates the proposed method against established PEFT paradigms by examining its performance and efficiency on a variety of NLP and VL tasks, emphasizing DePT’s proficiency in optimizing model performance and computational resources.

Conceptual Foundation

DePT extends the established prompt tuning (PT) framework, which utilizes trainable soft prompt vectors added to model inputs for efficient parameter tuning. PT is recognized for achieving competitive task performance without scaling up the parameter counts aligned with model size increments. However, PT's computational burden rises due to the added length of input sequences, impacting both memory and processing throughput. Efficiently addressing these technological bottlenecks, DePT decomposes the traditional soft prompt into a shorter prompt paired with low-rank matrices, enabling dual-rate gradient optimization. This method effectively enhances computational efficiency by maintaining performance while reducing memory and time expenses.

Methodological Analysis

The paper investigates DePT through comprehensive experimentation on a wide array of tasks—23 NLP tasks from benchmarks like GLUE and SuperGLUE, and VL tasks exemplified by VQA and MSCOCO. The experiments illustrate DePT’s superior performance over state-of-the-art PEFT techniques, including full fine-tuning baselines, notably under conditions of increased model size. The decomposition method introduces a novel dual-rate learning mechanism, wherein different components of the prompt are trained at distinct rates, optimizing convergence and facilitating substantial training and inference efficiency.

Empirical Findings and Numerical Outcomes

Performance Efficiency: Across diverse benchmarks, DePT not only surpassed PT variants but also outcompeted state-of-the-art PEFT methods by achieving higher accuracy scores while maintaining the parameter economy.
Time and Space Complexity: DePT demonstrated improved memory usage and reduced training and inference times, showing enhancements up to 20% over traditional PT approaches. Significantly, these advantages heightened with increasing model size, suggesting scalability benefits for large-scale implementations.
Zero-shot and Few-shot Adaptability: DePT's integration with parameter-efficient transfer learning (PETL) was empirically validated, displaying robustness in few-shot learning contexts. The compatibility with various model architectures underscores its adaptability, proving efficient even when data is sparse.

Practical and Theoretical Implications

From a practical standpoint, DePT offers promising advances in deploying LLMs for real-time applications where computational resources are constrained. The theoretical implications extend towards enhancing our understanding of parameter decomposition strategies in model tuning, potentially influencing future algorithmic developments in both language and multimodal learning paradigms.

Potential Developments

While DePT presents improved efficiency and effectiveness, exploring its integration with other advanced PEFT methods, like Adapter and LoRA, can further fine-tune its applicability across diverse contexts. Additionally, understanding its performance within exceedingly long sequence tasks and investigating other high-performance sectors of LLMs can delineate future research trajectories.

In summary, DePT embodies a judicious step toward achieving optimum efficiency in model training and inference. By balancing parameter complexity and computational workload, it paves the way for scalable and adaptable machine learning solutions within both academic and industrial sectors.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Zhengxiang Shi (10 papers)
Aldo Lipani (27 papers)

Citations (22)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - ZhengxiangShi/DePT: [ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning" (95 stars)