Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models (2203.06904v2)

Published 14 Mar 2022 in cs.CL, cs.AI, and cs.LG

Abstract: Despite the success, the process of fine-tuning large-scale PLMs brings prohibitive adaptation costs. In fact, fine-tuning all the parameters of a colossal model and retaining separate instances for different tasks are practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, dubbed as delta tuning in this paper. In contrast with the standard fine-tuning, delta tuning only fine-tunes a small portion of the model parameters while keeping the rest untouched, largely reducing both the computation and storage costs. Recent studies have demonstrated that a series of delta tuning methods with distinct tuned parameter selection could achieve performance on a par with full-parameter fine-tuning, suggesting a new promising way of stimulating large-scale PLMs. In this paper, we first formally describe the problem of delta tuning and then comprehensively review recent delta tuning approaches. We also propose a unified categorization criterion that divide existing delta tuning methods into three groups: addition-based, specification-based, and reparameterization-based methods. Though initially proposed as an efficient method to steer large models, we believe that some of the fascinating evidence discovered along with delta tuning could help further reveal the mechanisms of PLMs and even deep neural networks. To this end, we discuss the theoretical principles underlying the effectiveness of delta tuning and propose frameworks to interpret delta tuning from the perspective of optimization and optimal control, respectively. Furthermore, we provide a holistic empirical study of representative methods, where results on over 100 NLP tasks demonstrate a comprehensive performance comparison of different approaches. The experimental results also cover the analysis of combinatorial, scaling and transferable properties of delta tuning.

Citations (188)

View on Semantic Scholar

Summary

The paper introduces delta tuning as a method that updates only a minimal set of parameters, making PLM adaptation more efficient.
It categorizes techniques into addition-based, specification-based, and reparameterization-based methods with theoretical insights from optimization and control perspectives.
Extensive experiments across 100+ NLP tasks confirm that delta tuning achieves similar performance to full fine-tuning while significantly lowering computational and storage demands.

Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained LLMs

The paper presents a comprehensive paper on the concept of "delta tuning," a term introduced as a parameter-efficient method for adapting large pre-trained LLMs (PLMs). With the ever-increasing scale of PLMs, the full fine-tuning process becomes computationally prohibitive and storage-demanding. Delta tuning addresses this by modifying only a small subset of model parameters, significantly reducing computational and storage costs while achieving performance comparable to full fine-tuning.

Key Contributions

Definition and Categorization: Delta tuning is defined as a method where only a minimal set of parameters is tuned, contrasting with traditional fine-tuning where all parameters are updated. The paper categorizes existing delta tuning methods into three groups:
- Addition-based Methods: These involve adding new parameters, such as adapter modules, to the existing model architecture.
- Specification-based Methods: These selectively update specific existing parameters, often based on heuristics or learned criteria.
- Reparameterization-based Methods: These transform the parameter space to a lower-dimensional representation, motivated by hypotheses about the low-rank or low-dimensional nature of adaptation processes.
Theoretical Frameworks: The paper explores delta tuning from both optimization and optimal control perspectives:
- Optimization Perspective: The authors discuss how delta tuning can be seen as subspace optimization or functional approximation within a neural network, leveraging the low intrinsic dimensionality of adaptation.
- Optimal Control Perspective: It views delta tuning as an optimal control problem where the delta parameters act as controllers to steer the PLM towards desired outcomes.
Empirical Study: Extensive experiments across 100+ NLP tasks reveal the practical effectiveness of delta tuning. Key findings include:
- Comparable performance to full fine-tuning, especially when the scale of the PLM increases.
- Enhanced convergence rates and performance when combining multiple delta tuning methods.
- Noteworthy transferability of delta tuning methods across tasks, highlighting the potential for knowledge sharing through trained delta modules.
Applications: Delta tuning is particularly valuable in scenarios requiring efficient computation and storage, such as:
- Multi-task learning and the creation of shareable, task-specific checkpoints.
- Mitigating catastrophic forgetting in lifelong learning settings.
- Facilitating PLMs-as-a-service models, where multiple users can efficiently deploy and adapt models for various downstream tasks.

Implications and Future Directions

Delta tuning offers a promising approach to efficiently leverage the power of large PLMs, making them more accessible and deployable across different computational environments. As PLMs grow ever larger, methods like delta tuning will likely gain prominence in both academic research and practical industry applications. Future research may focus on further refining these methods, exploring additional theoretical frameworks, and broadening the range of applications in AI systems. This paper lays a foundation for ongoing innovations in the efficient deployment of PLMs, pointing towards a scalable approach to model adaptation in NLP and beyond.

PDF Markdown

Related Papers

Tweets

https://twitter.com/_cartick/status/1904639332922909042