Characterizing the Accuracy -- Efficiency Trade-off of Low-rank Decomposition in Language Models (2405.06626v2)

Published 10 May 2024 in cs.LG and cs.CL

Abstract: Recent LLMs employ billions of parameters to enable broad problem-solving capabilities. Such LLMs also tend to be memory-bound because of the dominance of matrix-vector and matrix-matrix multiplications with low arithmetic intensity. Therefore, optimizing the memory footprint and traffic is an important optimization direction for LLMs today. Model compression methods such as quantization and parameter pruning have been actively explored to achieve memory footprint and traffic optimization. However, the accuracy-efficiency trade-off of rank pruning (i.e., low-rank decomposition) for LLMs is not well-understood yet. Therefore, in this work, we characterize the accuracy-efficiency trade-off of a low-rank decomposition method, specifically Tucker decomposition, on recent LLMs, including an open-source LLM, Llama 2. We formalize the low-rank decomposition design space and show that the decomposition design space is enormous (e.g., O($2^{39}$) for Llama2-7B). To navigate such a vast design space, we formulate it and perform thorough case studies of accuracy-efficiency trade-offs using six widely used LLM benchmarks on BERT and Llama 2 models. Our results show that we can achieve a 9\% model size reduction with minimal accuracy drops, which range from 4\%p (\%p refers to "percentage point," which refers to the absolute difference between two percentage numbers; 74\% -> 78\% = 4\%p increase) to 10\%p, depending on the difficulty of the benchmark, without any retraining to recover accuracy after decomposition. The results show that low-rank decomposition can be a promising direction for LLM-based applications that require real-time service at scale (e.g., AI agent and real-time coding assistant), where the latency is as important as the model accuracy.

PDF Abstract

Understanding the ACM Consolidated LaTeX Template

Introduction

The paper "The Name of the Title Is Hope" is all about the ACM consolidated LaTeX template, which provides a standardized way of formatting documents for ACM publication. This handy guide is beneficial whether you're new to ACM publishing or a seasoned author. The aim is to create a consistent style across all ACM publications, simplifying the authoring process and improving the readability of the documents.

Key Template Features

Multiple Document Types

The "acmart" document class can be utilized for various types of ACM publications, such as:

Full-length conference papers
Two-page abstracts
Journal articles
Extended abstracts for conferences

This flexibility is achieved by selecting the appropriate template style and template parameters.

Template Styles

Here's a breakdown of some template styles available:

acmsmall: For most ACM journals.
acmlarge: Specific journals like JOCCH and TAP.
acmtog: For TOG (Transactions on Graphics).

For conference proceedings, the dominant style is sigconf. Variants like sigchi and sigplan cater to their respective conferences.

Template Parameters

Template parameters allow customization of document settings:

anonymous, review: For double-anonymous submissions, anonymizes the work and includes line numbers.
authorversion: A version suitable for author posting.
screen: Enables colored hyperlinks.

Format and Style Rules

Non-Modifiable Elements

Modifying the template in ways like adjusting margins, changing typeface sizes, or altering line spacing is not allowed. Any detected changes will result in the document being returned for revisions.

Typeface

The template requires the use of the "Libertine" typeface family, which should already be included in your TeX installation. The use of other typefaces like "lmodern" or "ltimes" is discouraged as they override the preset typeface families.

Structuring Your Document

Title and Authors Information

The title should use appropriate capitalization.
Authors should be listed with full names and affiliations.
Short titles and short author lists are necessary for header formatting.

Rights Information

Upon paper approval, authors will need to complete a rights form, which includes:

ACM Reference Format on the first page.
Rights management text.
Conference information in page headers.

CCS Concepts and Keywords

For taxonomic and search-friendly purposes:

Use the ACM Computing Classification System (accessible at https://dl.acm.org/ccs/ccs.cfm) for relevant classifiers and concepts.
User-defined keywords are a list of chosen words/phrases that describe the research.

Numbered Section Commands

The document supports standard LaTeX sectioning commands like:

\section
\subsection
\subsubsection
\paragraph

Proper numbering must be maintained and simulating sections using bold or italic text is not allowed.

Figures and Tables

Tables

The "acmart" class includes "booktabs" for high-quality tables:

Table captions are placed above the table.
Tables should float to the top of the nearest page.

Figures

Figures should use the "figure" environment and:

Include a caption below the figure.
Provide a description for accessibility.
Optionally include a "teaser figure" before the body of the article.

Equations

Mathematical equations come in three styles:

Inline Equations: Set within the text using the $...$ format.
Display Equations: Centered, numbered equations produced by the "equation" environment.
Non-Numbered Display Equations: Created using the "displaymath" environment.

Citations and Bibliographies

The use of BibTeX for references is recommended:

Full author names and detailed referencing information are required.
Example command: \bibliography{bibfile}

Acknowledgments

The "acks" environment is used for acknowledgments, positioned just before the references.

1
2
3

\begin{acks}
...
\end{acks}

Conclusion

The ACM consolidated LaTeX template is an invaluable tool for authors looking to submit to ACM publications. This guide simplifies the use of the "acmart" document class and helps ensure consistent, high-quality submissions.

For further details, authors can refer to the full documentation available at the ACM website. Happy writing!

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Chakshu Moar (2 papers)
Michael Pellauer (16 papers)
Hyoukjun Kwon (21 papers)
Faraz Tahmasebi (3 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/TheTuringPost/status/1791122937820680267