Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 161 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 31 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

VeRA: Vector-based Random Matrix Adaptation (2310.11454v2)

Published 17 Oct 2023 in cs.CL

Abstract: Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning LLMs, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which significantly reduces the number of trainable parameters compared to LoRA, yet maintains the same performance. It achieves this by using a single pair of low-rank matrices shared across all layers and learning small scaling vectors instead. We demonstrate its effectiveness on the GLUE and E2E benchmarks, image classification tasks, and show its application in instruction-tuning of 7B and 13B LLMs.

Citations (80)

Summary

  • The paper introduces VeRA, a method that leverages shared low-rank matrices and trainable scaling vectors to drastically reduce the number of parameters in model finetuning.
  • It demonstrates competitive results on benchmarks like GLUE, E2E, and image classification, achieving similar or superior metrics compared to LoRA with significantly fewer parameters.
  • Its practical implications include reduced memory usage and efficient deployment in both cloud and edge devices, while also inspiring future research into lower-dimensional adaptation spaces.

Vector-based Random Matrix Adaptation (VeRA): A Comprehensive Overview

Introduction

In the field of LLMs, the challenge of efficient model adaptation has become paramount. This paper introduces Vector-based Random Matrix Adaptation (VeRA), a novel finetuning method designed to minimize the number of trainable parameters while maintaining performance comparable to existing state-of-the-art approaches like Low-Rank Adaptation (LoRA). The primary innovation in VeRA is the use of a single pair of randomly initialized, low-rank matrices shared across all layers, reparameterized with trainable scaling vectors. This method results in significant memory savings, facilitates efficient model deployment, and is particularly suited for scalable applications in cloud-based AI services and edge devices.

Methodology

Core Mechanism of VeRA

VeRA leverages two sets of low-rank matrices, AA and BB, which are frozen and shared across all layers of the model. The adaptation is achieved through trainable scaling vectors, dd and bb, that adjust the influence of these matrices on a per-layer basis. In mathematical terms, the adjustment to an initial weight matrix W0W_0 is:

h=W0x+ΔWx=W0x+ΛbBΛdAxh = W_0x + \Delta W x = W_0x + \boldsymbol{\Lambda_{b}} B \boldsymbol{\Lambda_{d}} A x

where Λb\Lambda_{b} and Λd\Lambda_{d} are diagonal matrices derived from the scaling vectors bb and dd respectively.

Initialization Strategies

The shared matrices AA and BB are initialized using Kaiming initialization for numerical stability, while the scaling vectors bb and dd are initialized such that bb starts at zero and dd with a small constant value. This initialization ensures that the model's initial behavior remains close to the pretrained state, facilitating stable and gradual adaptation during finetuning.

Experimental Evaluation

GLUE Benchmark Performance

The evaluation on the General Language Understanding Evaluation (GLUE) benchmark indicates that VeRA achieves performance comparable to LoRA on both RoBERTa\textsubscript{base} and RoBERTa\textsubscript{large} models but with a significantly reduced parameter count. For example, VeRA manages to train RoBERTa\textsubscript{base} with just 43K parameters achieving an average performance of 85.2, closely paralleling LoRA's 86.6 with 300K parameters. This demonstrates VeRA's superior parameter efficiency while maintaining high predictive accuracy.

E2E Benchmark and Instruction Tuning

On the E2E benchmark, VeRA also outperforms LoRA with a three to four-fold reduction in trainable parameters for GPT-2 models while achieving better BLEU, NIST, METEOR, and ROUGE-L scores. Similarly, in instruction tuning of LLaMA models, VeRA achieves comparable scores to LoRA on the MT-Bench while using 100 times fewer trainable parameters. These results underscore VeRA’s capability in various adaptation scenarios, including language generation tasks and instruction tuning.

Image Classification Tasks

Further investigation on image classification tasks using Vision Transformers (ViT) affirms the versatility of VeRA. For instance, when finetuning ViT on CIFAR100, Food101, Flowers102, and RESISC45 datasets, VeRA retains high classification accuracy while reducing the number of trainable parameters by an order of magnitude compared to LoRA.

Implications

Practical Applications

VeRA's substantial reduction in trainable parameters has immediate practical implications. Primarily, it lowers memory requirements, which is critical for deploying multiple model instances on a single GPU, thereby enhancing serving efficiency in cloud environments. This is especially beneficial for personalized AI services, enabling context-specific adjustments without necessitating extensive additional storage.

Theoretical Insights and Future Directions

The paper also raises intriguing questions about the latent structure and dimensionality of model adaptation spaces. VeRA’s efficiency suggests that pretrained models can be finetuned effectively within lower-dimensional subspaces than previously assumed. This points to potential future explorations in dynamic parameter budget allocations and novel initialization strategies to further optimize training efficiency and model performance.

Conclusion

VeRA introduces a significant advancement in the parameter-efficient adaptation of LLMs. It provides comparable performance to existing methods like LoRA with drastically fewer trainable parameters, thereby enhancing deployment feasibility and serving efficiency. The experimental results across different benchmarks and tasks validate its versatility and effectiveness, marking it as a promising direction for future research in model finetuning. This method holds considerable potential for widespread application, especially in environments where computational resources are at a premium.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 11 tweets and received 866 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com