SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors
Abstract: The paper presents a novel approach to parameter-efficient fine-tuning (PEFT) of large pre-trained models (both language and vision models) called Singular Vectors guided Fine-Tuning (SVFT). SVFT diverges from existing PEFT methods by incorporating the structure of the pre-trained weight matrices into the fine-tuning process. The core idea involves updating the pre-trained weights as sparse combinations of the singular vectors of the original matrix, with only the coefficients of these combinations being tunable. Empirical evaluations show that SVFT significantly narrows the performance gap with full fine-tuning, achieving a performance recovery of up to 96% of full-fine tuning using only 0.006 to 0.25% of trainable parameters.
Introduction
Parameter-efficient fine-tuning (PEFT) has become critical for adapting large-scale pre-trained models to specific downstream tasks without incurring the computational and storage costs associated with full fine-tuning. Conventional PEFT methods like LoRA and its variants add low-rank updates to the frozen pre-trained weights and have shown effective, though with limitations in expressivity and performance due to their fixed, additive low-rank structures.
Methodology
SVFT proposes a fundamental shift: instead of arbitrary low-rank updates, the update is determined by the singular vectors of the pre-trained weights themselves. Specifically:
- For any weight matrix , the perturbation is parameterized as , where and are the left and right singular vectors of , represents singular values, and is a sparse trainable matrix.
Design Choices
SVFT allows flexibility through four specific variants in the structure of :
- Plain (SVFTP): Here, is purely diagonal, meaning only the singular values are adjusted.
- Banded (SVFTB_d): Sparse off-diagonal elements around the diagonal are learnable, allowing localized interactions among singular vectors.
- Random (SVFTR_d): Randomly selected elements of are made learnable.
- Top- (SVFTT_k): Selects the top- interactions between left and right singular vectors based on their alignment.
Theoretical Analysis
Key properties elucidate SVFT's advantages:
- Structure: If is diagonal, maintains the singular vector directions of , ensuring minimal disruptive updates.
- Expressivity: SVFT can model any desired perturbation by appropriately tuning .
- Rank: For a given number of non-zero entries in , the rank of the perturbation induced by SVFT can match or exceed those of previous PEFT methods for a fixed parameter budget.
Empirical Results
SVFT displays significant gains across both language and vision tasks:
- LLMs: Experiments with large models like Gemma-2B and LLaMA-3-8B demonstrate that SVFT substantially closes the performance gap with full fine-tuning on tasks like GSM-8K and general language understanding benchmarks. Notably, SVFT recovered up to 96% of full fine-tuning performance while training only a fraction of the parameters (0.006 to 0.25%).
- Vision Models: When applied to vision transformers such as ViT-B and ViT-L, SVFT significantly outperformed baselines on classification benchmarks like CIFAR-100 and Flowers102, indicating its efficacy in the vision domain.
Implications and Future Directions
SVFT's integration of weight matrix structures into the fine-tuning process brings an innovative and effective approach to PEFT, offering practical advantages in terms of both storage and computational efficiency. Looking forward, optimizing the sparsity patterns of further, as well as exploring quantization techniques to mitigate memory overhead, could enhance SVFT's applicability and efficiency even further.
Conclusion
SVFT represents a notable advancement in the arena of parameter-efficient fine-tuning. By judiciously leveraging the singular vectors of the pre-trained weights, SVFT delivers high performance with a remarkably small footprint in terms of trainable parameters. This work not only sets a new benchmark in PEFT but also opens avenues for future research into more expressive and efficient fine-tuning techniques leveraging structured updates.