Parameter Efficient Fine-tuning via Explained Variance Adaptation (2410.07170v4)

Published 9 Oct 2024 in cs.LG, cs.AI, cs.CL, and stat.ML

Abstract: Foundation models (FMs) are pre-trained on large-scale datasets and then fine-tuned for a specific downstream task. The most common fine-tuning method is to update pretrained weights via low-rank adaptation (LoRA). Existing initialization strategies for LoRA often rely on singular value decompositions (SVD) of gradients or weight matrices. However, they do not provably maximize the expected gradient signal, which is critical for fast adaptation. To this end, we introduce Explained Variance Adaptation (EVA), an initialization scheme that uses the directions capturing the most activation variance, provably maximizing the expected gradient signal and accelerating fine-tuning. EVA performs incremental SVD on minibatches of activation vectors and selects the right-singular vectors for initialization once they converged. Further, by selecting the directions that capture the most activation-variance for a given rank budget, EVA accommodates adaptive ranks that reduce the number of trainable parameters, while maintaining or improving downstream performance. We apply EVA to a variety of fine-tuning tasks as language generation and understanding, image classification, and reinforcement learning. EVA exhibits faster convergence than competitors and achieves the highest average score across a multitude of tasks per domain while reducing the number of trainable parameters through rank redistribution.

Citations (1)

View on Semantic Scholar

Summary

The paper proposes Explained Variance Adaptation (EVA), a method that initializes LoRA matrices using SVD-derived singular vectors on activation data.
The method adapts rank allocation across model layers based on explained variance ratios, enhancing fine-tuning efficiency across tasks.
Empirical results show that EVA improves convergence and performance on language, vision, and reinforcement learning benchmarks.

Fine-Tuning Foundation Models with Explained Variance Adaptation

The paper "One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation" introduces a novel methodology for enhancing the fine-tuning process of foundation models (FMs) through a technique called Explained Variance Adaptation (EVA). The primary objective of EVA is to improve convergence speed and overall performance across various tasks by optimizing the initialization process of low-rank adaptation (LoRA) matrices using singular value decomposition (SVD) on activation vectors from downstream tasks.

Methodological Overview

The authors propose EVA as an extension to LoRA, a parameter-efficient fine-tuning (PEFT) method, which reduces the computational demands of fine-tuning large-scale models by introducing low-rank matrices. EVA distinguishes itself by leveraging SVD on minibatch activation vectors to initialize LoRA matrices based on the variance they explain. This data-driven adaptation allows for more informed rank allocation across model layers, enhancing fine-tuning efficiency and effectiveness.

Key Contributions

Data-Driven Initialization: EVA enhances LoRA by initializing the low-rank matrices using right-singular vectors obtained through SVD on activations. This contrasts with the typical random initialization approach, aiming to better capture task-relevant information.
Adaptive Rank Redistribution: Using the explained variance ratios from SVD, EVA reallocates ranks across model layers to maximize the representational efficiency during fine-tuning, guided by the variance captured in each layer's weights.
Low Computational Overhead: The incremental computation of SVD during the early training phase ensures that EVA adds minimal overhead, making it scalable for large models.

Empirical Evaluation

EVA was evaluated on a range of tasks including language generation and understanding, image classification, and reinforcement learning. Across these domains, EVA demonstrated superior performance, often surpassing existing methods in average task scores.

LLMs: EVA improved the convergence and performance on tasks such as common sense reasoning and mathematical problem solving when applied to models like \llamatwo{}, \llamathree{}, and \gemma{}.
Vision Tasks: In the VTAB-1K benchmark, EVA achieved the highest average accuracy, particularly excelling on in-domain datasets.
Reinforcement Learning: On the Meta-World suite, EVA not only outperformed LoRA but also, in combination with DoRA, exceeded full fine-tuning performance in some scenarios.

Theoretical and Practical Implications

Theoretically, EVA underscores the value of a data-driven approach to model initialization, leveraging downstream task data to inform model adaptations. Practically, this method provides a scalable solution to refining FMs, opening avenues for more efficient adaptation in diverse application domains.

Speculation on Future Developments

Future improvements could involve integrating gradient information into the initialization phase or exploring quantization to further enhance efficiency. The success of EVA in optimizing initialization suggests potential for its principles to be extended to other adaptation mechanisms beyond LoRA.

In summary, Explained Variance Adaptation represents a significant methodological advancement in the fine-tuning of foundation models. By effectively marrying data-driven insights with adaptive rank allocation, EVA offers a promising route to optimizing model performance with reduced computational burden.