SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors (2405.19597v1)

Published 30 May 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights (W) and inject learnable matrices (\Delta W). These (\Delta W) matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although recent PEFT methods have narrowed this gap, they do so at the cost of additional learnable parameters. We propose SVFT, a simple approach that fundamentally differs from existing methods: the structure imposed on (\Delta W) depends on the specific weight matrix (W). Specifically, SVFT updates (W) as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations. This approach allows fine-grained control over expressivity through the number of coefficients. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that only recover up to 85% performance using 0.03 to 0.8% of the trainable parameter budget.

PDF HTML Abstract

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

Abstract: The paper presents a novel approach to parameter-efficient fine-tuning (PEFT) of large pre-trained models (both language and vision models) called Singular Vectors guided Fine-Tuning (SVFT). SVFT diverges from existing PEFT methods by incorporating the structure of the pre-trained weight matrices into the fine-tuning process. The core idea involves updating the pre-trained weights as sparse combinations of the singular vectors of the original matrix, with only the coefficients of these combinations being tunable. Empirical evaluations show that SVFT significantly narrows the performance gap with full fine-tuning, achieving a performance recovery of up to 96% of full-fine tuning using only 0.006 to 0.25% of trainable parameters.

Introduction

Parameter-efficient fine-tuning (PEFT) has become critical for adapting large-scale pre-trained models to specific downstream tasks without incurring the computational and storage costs associated with full fine-tuning. Conventional PEFT methods like LoRA and its variants add low-rank updates to the frozen pre-trained weights and have shown effective, though with limitations in expressivity and performance due to their fixed, additive low-rank structures.

Methodology

SVFT proposes a fundamental shift: instead of arbitrary low-rank updates, the update is determined by the singular vectors of the pre-trained weights themselves. Specifically:

For any weight matrix $W$ , the perturbation $\Delta W$ is parameterized as $W + \Delta W = U(\Sigma + M)V^T$ , where $U$ and $V$ are the left and right singular vectors of $W$ , $\Sigma$ represents singular values, and $M$ is a sparse trainable matrix.

Design Choices

SVFT allows flexibility through four specific variants in the structure of $M$ :

Plain (SVFT^P): Here, $M$ is purely diagonal, meaning only the singular values are adjusted.
Banded (SVFT^B_d): Sparse off-diagonal elements around the diagonal are learnable, allowing localized interactions among singular vectors.
Random (SVFT^R_d): Randomly selected elements of $M$ are made learnable.
Top- $k$ (SVFT^T_k): Selects the top- $k$ interactions between left and right singular vectors based on their alignment.

Theoretical Analysis

Key properties elucidate SVFT's advantages:

Structure: If $M$ is diagonal, $W + \Delta W$ maintains the singular vector directions of $W$ , ensuring minimal disruptive updates.
Expressivity: SVFT can model any desired perturbation by appropriately tuning $M$ .
Rank: For a given number of non-zero entries in $M$ , the rank of the perturbation induced by SVFT can match or exceed those of previous PEFT methods for a fixed parameter budget.

Empirical Results

SVFT displays significant gains across both language and vision tasks:

LLMs: Experiments with large models like Gemma-2B and LLaMA-3-8B demonstrate that SVFT substantially closes the performance gap with full fine-tuning on tasks like GSM-8K and general language understanding benchmarks. Notably, SVFT recovered up to 96% of full fine-tuning performance while training only a fraction of the parameters (0.006 to 0.25%).
Vision Models: When applied to vision transformers such as ViT-B and ViT-L, SVFT significantly outperformed baselines on classification benchmarks like CIFAR-100 and Flowers102, indicating its efficacy in the vision domain.

Implications and Future Directions

SVFT's integration of weight matrix structures into the fine-tuning process brings an innovative and effective approach to PEFT, offering practical advantages in terms of both storage and computational efficiency. Looking forward, optimizing the sparsity patterns of $M$ further, as well as exploring quantization techniques to mitigate memory overhead, could enhance SVFT's applicability and efficiency even further.

Conclusion

SVFT represents a notable advancement in the arena of parameter-efficient fine-tuning. By judiciously leveraging the singular vectors of the pre-trained weights, SVFT delivers high performance with a remarkably small footprint in terms of trainable parameters. This work not only sets a new benchmark in PEFT but also opens avenues for future research into more expressive and efficient fine-tuning techniques leveraging structured updates.