Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization (2311.06243v2)

Published 10 Nov 2023 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, LLMs, and text-to-image diffusion models to various downstream tasks in vision and language.

Citations (39)

View on Semantic Scholar

Summary

The paper introduces BOFT, a novel framework that leverages butterfly factorization to achieve parameter-efficient orthogonal finetuning.
It represents dense orthogonal matrices as compositions of sparse matrices, reducing trainable parameters to O(d log d) without sacrificing performance.
Experiments across vision, language, and text-to-image tasks show BOFT outperforms or matches leading methods with lower computation.

An Expert Review on "Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization"

In the paper titled "Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization," the authors introduce an innovative methodology to efficiently adapt large foundation models for downstream tasks. The focal point of their approach, Orthogonal Butterfly (BOFT), represents a significant advance over existing finetuning paradigms by employing butterfly structures to achieve enhanced parameter efficiency.

Motivation and Framework

The growing ubiquity of large foundation models necessitates more efficient finetuning techniques. Training massive models like GPT-3 from scratch imposes prohibitive computational costs. Thus, adapting pretrained models with a limited parameter budget becomes imperative. The researchers identify Orthogonal Finetuning (OFT) as a promising candidate due to its strong generalization capabilities, yet it remains computationally intensive due to the dimensionality of orthogonal matrices involved.

The authors propose leveraging the butterfly factorization to enhance OFT's parameter efficiency. Their approach is inspired by the mathematical properties of fast Fourier transforms where butterfly structures facilitate efficient information exchange. In the BOFT framework, a dense orthogonal matrix is represented as a composition of sparse orthogonal matrices, thereby significantly reducing the number of trainable parameters to $\mathcal{O}(d\log d)$ .

Methodology

BOFT advances the field by integrating a novel parameter-efficient orthogonal parameterization. The methodology employs multiple sparse matrices within a butterfly structure to construct dense matrices. This enhancement allows BOFT to subsume traditional OFT as a specialized instance while presenting a generalized orthogonal finetuning framework. Theoretically, this structure facilitates a smooth interpolation between universal expressibility and strong regularization, promoting superior generalization.

The paper delineates a comprehensive evaluation of BOFT across various applications such as computer vision, natural language processing, and text-to-image generation. The model's efficacy is validated against numerous benchmarks, including vision transformers, LLMs, and diffusion models. Notably, BOFT consistently outperforms or matches state-of-the-art methods like LoRA and OFT, often with fewer parameters.

Implications and Future Prospects

BOFT's introduction delivers both theoretical and practical ramifications. Theoretically, it provides a framework to examine parameter efficiency through an information transmission perspective. Practically, it extends the applicability of orthogonal finetuning to a broader spectrum of tasks, demonstrating versatility and improved performance.

The potential of BOFT is far-reaching in AI and machine learning. By efficiently adapting models with lower computational demands, BOFT opens avenues for democratizing access to advanced AI capabilities. Future work could explore optimizing the butterfly matrix multiplication and potentially identifying alternative network topologies to enhance parameter efficiency even further.

By contributing a robust, scalable, and adaptable framework with BOFT, this research marks a pivotal step towards realizing more efficient and practical use cases for large-scale pretrained models. The insights gained from this work hold promise for continued improvements in reducing the computational overhead of model adaptation, thus accelerating the broader adoption and effectiveness of foundation models.

PDF Markdown

Related Papers

Tweets

https://twitter.com/Besteuler/status/1778821360640315506

https://twitter.com/BenjaminBossan/status/1791464458604425619

YouTube

Show All Videos