Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 73 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 85 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 464 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Exploring Quantization for Efficient Pre-Training of Transformer Language Models (2407.11722v2)

Published 16 Jul 2024 in cs.LG

Abstract: The increasing scale of Transformer models has led to an increase in their pre-training computational requirements. While quantization has proven to be effective after pre-training and during fine-tuning, applying quantization in Transformers during pre-training has remained largely unexplored at scale for LLMing. This study aims to explore the impact of quantization for efficient pre-training of Transformers, with a focus on linear layer components. By systematically applying straightforward linear quantization to weights, activations, gradients, and optimizer states, we assess its effects on model efficiency, stability, and performance during training. By offering a comprehensive recipe of effective quantization strategies to be applied during the pre-training of Transformers, we promote high training efficiency from scratch while retaining LLMing ability. Code is available at https://github.com/chandar-lab/EfficientLLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Quarot: Outlier-free 4-bit inference in rotated llms. arXiv preprint arXiv:2404.00456.
  2. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432.
  3. One billion word benchmark for measuring progress in statistical language modeling. Preprint, arXiv:1312.3005.
  4. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35:16344–16359.
  5. GPT3.int8(): 8-bit matrix multiplication for transformers at scale. In Advances in Neural Information Processing Systems.
  6. 8-bit optimizers via block-wise quantization. arXiv preprint arXiv:2110.02861.
  7. Tim Dettmers and Luke Zettlemoyer. 2023. The case for 4-bit precision: k-bit inference scaling laws. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org.
  8. Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations.
  9. GPTQ: Accurate post-training compression for generative pretrained transformers. arXiv preprint arXiv:2210.17323.
  10. A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision, pages 291–326. Chapman and Hall/CRC.
  11. Aaron Gokaslan and Vanya Cohen. 2019. Openwebtext corpus. http://Skylion007.github.io/OpenWebTextCorpus.
  12. Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415.
  13. I-bert: Integer-only bert quantization. In International conference on machine learning, pages 5506–5518. PMLR.
  14. Memory efficient optimizers with 4-bit states. Advances in Neural Information Processing Systems, 36.
  15. Visualizing the loss landscape of neural nets. Advances in neural information processing systems, 31.
  16. Qft: Quantized full-parameter tuning of llms with affordable resources. arXiv preprint arXiv:2310.07147.
  17. Estimating the carbon footprint of bloom, a 176b parameter language model. Journal of Machine Learning Research, 24(253):1–15.
  18. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330.
  19. Quantized distributed training of large models with convergence guarantees. arXiv preprint arXiv:2302.02390.
  20. Pointer sentinel mixture models. Preprint, arXiv:1609.07843.
  21. A white paper on neural network quantization. arXiv preprint arXiv:2106.08295.
  22. Overcoming oscillations in quantization-aware training. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 16318–16330. PMLR.
  23. The LAMBADA dataset: Word prediction requiring a broad discourse context. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1525–1534, Berlin, Germany. Association for Computational Linguistics.
  24. Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350.
  25. Fp8-lm: Training fp8 large language models. arXiv preprint arXiv:2310.18313.
  26. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  27. Massive activations in large language models. arXiv preprint arXiv:2402.17762.
  28. Efficient transformers: A survey.(2020). arXiv preprint cs.LG/2009.06732.
  29. Attention is all you need. Advances in neural information processing systems, 30.
  30. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461.
  31. Stable and low-precision training for large-scale vision-language models. Advances in Neural Information Processing Systems, 36.
  32. Jetfire: Efficient and accurate transformer pretraining with int8 data flow and per-block quantization. arXiv preprint arXiv:2403.12422.
  33. Training transformers with 4-bit integers. Advances in Neural Information Processing Systems, 36.
  34. Smoothquant: Accurate and efficient post-training quantization for large language models. In International Conference on Machine Learning, pages 38087–38099. PMLR.
  35. Quick and (not so) dirty: Unsupervised selection of justification sentences for multi-hop question answering. arXiv preprint arXiv:1911.07176.
  36. Hellaswag: Can a machine really finish your sentence? arXiv preprint arXiv:1905.07830.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 54 likes.