Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Green AI in Fine-tuning Large Language Models via Adaptive Backpropagation (2309.13192v2)

Published 22 Sep 2023 in cs.LG and cs.AI

Abstract: Fine-tuning is the most effective way of adapting pre-trained LLMs to downstream applications. With the fast growth of LLM-enabled AI applications and democratization of open-souced LLMs, fine-tuning has become possible for non-expert individuals, but intensively performed LLM fine-tuning worldwide could result in significantly high energy consumption and carbon footprint, which may bring large environmental impact. Mitigating such environmental impact towards Green AI directly correlates to reducing the FLOPs of fine-tuning, but existing techniques on efficient LLM fine-tuning can only achieve limited reduction of such FLOPs, due to their ignorance of the backpropagation cost in fine-tuning. To address this limitation, in this paper we present GreenTrainer, a new LLM fine-tuning technique that adaptively evaluates different tensors' backpropagation costs and contributions to the fine-tuned model accuracy, to minimize the fine-tuning cost by selecting the most appropriate set of tensors in training. Such selection in GreenTrainer is made based on a given objective of FLOPs reduction, which can flexibly adapt to the carbon footprint in energy supply and the need in Green AI. Experiment results over multiple open-sourced LLM models and abstractive summarization datasets show that, compared to fine-tuning the whole LLM model, GreenTrainer can save up to 64% FLOPs in fine-tuning without any noticeable model accuracy loss. Compared to the existing fine-tuning techniques such as LoRa, GreenTrainer can achieve up to 4% improvement on model accuracy with on-par FLOPs reduction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. 2023 AI index report. https://aiindex.stanford.edu/report/, 2023.
  2. M. Abadi. Tensorflow: learning functions at scale. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming, pages 1–1, 2016.
  3. A. F. Aji and K. Heafield. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021, 2017.
  4. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  5. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1533–1544, 2013.
  6. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432–7439, 2020.
  7. L. Breiman. Random forests. Machine learning, 45:5–32, 2001.
  8. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  9. Tldr: Extreme summarization of scientific documents. arXiv preprint arXiv:2004.15011, 2020.
  10. h2ogpt: Democratizing large language models. arXiv preprint arXiv:2306.08161, 2023.
  11. Dialogsum: A real-life scenario dialogue summarization dataset. arXiv preprint arXiv:2105.06762, 2021.
  12. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  13. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  14. Samsum corpus: A human-annotated dialogue dataset for abstractive summarization. arXiv preprint arXiv:1911.12237, 2019.
  15. Fast axiomatic attribution for neural networks. Advances in Neural Information Processing Systems, 34:19513–19524, 2021.
  16. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  17. Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933, 2023.
  18. Elastictrainer: Speeding up on-device training with runtime elastic tensor selection. In Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services, pages 56–69, 2023a.
  19. Modality plug-and-play: Elastic modality adaptation in multimodal llms for embodied ai. arXiv preprint arXiv:2312.07886, 2023b.
  20. How does weight correlation affect generalisation ability of deep neural networks? Advances in Neural Information Processing Systems, 33:21346–21356, 2020.
  21. Tinytrain: Deep neural network training at the extreme edge. arXiv preprint arXiv:2307.09988, 2023.
  22. Professor forcing: A new algorithm for training recurrent networks. Advances in neural information processing systems, 29, 2016.
  23. Layer-adaptive sparsity for the magnitude-based pruning. arXiv preprint arXiv:2010.07611, 2020.
  24. Snip: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340, 2018.
  25. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  26. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.
  27. X. L. Li and P. Liang. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  28. Make your pre-trained model reversible: From parameter to memory efficient fine-tuning. arXiv preprint arXiv:2306.00477, 2023.
  29. C.-Y. Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/W04-1013.
  30. On-device training under 256kb memory. Advances in Neural Information Processing Systems, 35:22941–22954, 2022.
  31. Group fisher pruning for practical network compression. In International Conference on Machine Learning, pages 7021–7032. PMLR, 2021.
  32. P-tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 61–68, 2022.
  33. I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  34. Pretrained transformers as universal computation engines. arXiv preprint arXiv:2103.05247, 1, 2021.
  35. Fine-tuning language models with just forward passes. arXiv preprint arXiv:2305.17333, 2023.
  36. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022.
  37. Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023, 2016.
  38. fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038, 2019.
  39. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  40. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  41. Green ai. Communications of the ACM, 63(12):54–63, 2020.
  42. Fine-tuned language models are continual learners. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 6107–6122, 2022.
  43. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017.
  44. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  45. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  46. H. Wang and W. Gao. Tackling the unlimited staleness in federated learning with intertwined data and device heterogeneities. arXiv preprint arXiv:2309.13536, 2023.
  47. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
  48. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199, 2021.
  49. Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512, 2023.
  50. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Kai Huang (146 papers)
  2. Hanyun Yin (1 paper)
  3. Heng Huang (189 papers)
  4. Wei Gao (203 papers)
Citations (5)
X Twitter Logo Streamline Icon: https://streamlinehq.com