HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs (2501.02625v2)

Published 5 Jan 2025 in cs.LG

Abstract: Quantized training of LLMs remains an open challenge, as maintaining accuracy while performing all matrix multiplications in low precision has proven difficult. This is particularly the case when fine-tuning pre-trained models, which can have large weight and activation outlier values that make lower-precision optimization difficult. To address this, we present HALO, a novel quantization-aware training approach for Transformers that enables accurate and efficient low-precision training by combining 1) strategic placement of Hadamard rotations in both forward and backward passes, which mitigate outliers, 2) high-performance kernel support, and 3) FSDP integration for low-precision communication. Our approach ensures that all large matrix multiplications during the forward and backward passes are executed in lower precision. Applied to LLAMA-family models, HALO achieves near-full-precision-equivalent results during fine-tuning on various tasks, while delivering up to 1.41x end-to-end speedup for full fine-tuning on RTX 4090 GPUs. HALO efficiently supports both standard and parameterefficient fine-tuning (PEFT). Our results demonstrate the first practical approach to fully quantized LLM fine-tuning that maintains accuracy in 8-bit precision, while delivering performance benefits. Code is available at \url{https://github.com/IST-DASLab/HALO}.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (6)

Tweets

https://twitter.com/AshkboosSaleh/status/1888939629795008647

https://twitter.com/rush_tabesh/status/1891471123457442187

https://twitter.com/papers_anon/status/1876482655790977101

HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs (2501.02625v2)

Summary

Follow-up Questions

Related Papers

Authors (6)

Tweets