Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models (2403.13372v4)

Published 20 Mar 2024 in cs.CL and cs.AI
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Abstract: Efficient fine-tuning is vital for adapting LLMs to downstream tasks. However, it requires non-trivial efforts to implement these methods on different models. We present LlamaFactory, a unified framework that integrates a suite of cutting-edge efficient training methods. It provides a solution for flexibly customizing the fine-tuning of 100+ LLMs without the need for coding through the built-in web UI LlamaBoard. We empirically validate the efficiency and effectiveness of our framework on LLMing and text generation tasks. It has been released at https://github.com/hiyouga/LLaMA-Factory and received over 25,000 stars and 3,000 forks.

LlamaFactory: Unified Efficient Fine-Tuning of 100+ LLMs

Introduction to LlamaFactory

LlamaFactory represents a notable advancement in the field of NLP by providing a comprehensive framework for the efficient fine-tuning of over 100 different LLMs. It meets the challenge of reducing the significant computational and memory resources typically required for adapting these models to specific downstream tasks. By integrating a wide selection of efficient fine-tuning techniques, LlamaFactory allows for significant reductions in training costs, both in terms of computation and memory usage. This is achieved without the need for extensive coding, thanks to its built-in web UI, LlamaBoard, which offers a user-friendly interface for customizing model fine-tuning. The framework has garnered substantial attention, evidenced by its popularity on GitHub, with over 13,000 stars and 1,600 forks.

Efficient Fine-Tuning Techniques

The LlamaFactory framework incorporates a variety of methods to optimize the process of fine-tuning LLMs:

  • Efficient Optimization: Techniques such as freeze-tuning, gradient low-rank projection (GaLore), low-rank adaptation (LoRA), quantized LoRA (QLoRA), and decomposition of pre-trained weights (DoRA) are employed. These methods primarily aim at adjusting the parameters of LLMs efficiently, minimizing the overall fine-tuning costs.
  • Efficient Computation: This approach includes methods like mixed precision training, activation checkpointing, flash attention, and S2^2 attention, which serve to reduce the computation time and memory usage during the training process.

By balancing these techniques, LlamaFactory significantly improves the efficiency of fine-tuning LLMs, reducing the memory footprint to as low as 0.6 bytes per parameter in some cases.

Framework Overview

LlamaFactory is structured around three key modules:

  • Model Loader: Prepares various architectures for fine-tuning, supporting a vast array of LLMs.
  • Data Worker: Processes data from different tasks, transforming them into a unified format suitable for training.
  • Trainer: Utilizes efficient fine-tuning methods to adapt models to specific tasks and datasets.

Together, these components provide a flexible and scalable solution that significantly simplifies the process of LLM fine-tuning.

Empirical Validation

LlamaFactory's efficacy is empirically validated through LLMing and text generation tasks. It demonstrates an ability to maintain or even improve upon the performance of baseline models while significantly reducing the computational and memory demands associated with fine-tuning LLMs. This is illustrated through comparisons of training efficiency and the adaptation of various models to downstream tasks, showcasing the practical benefits of the integrated fine-tuning techniques.

Future Directions and Implications

The introduction of LlamaFactory represents a promising advancement in the field of natural language processing, especially in making efficient fine-tuning more accessible to the wider research community. Its modular design and integration with a user-friendly interface pave the way for further development and innovation in the fine-tuning of LLMs. As LlamaFactory continues to evolve, it is expected to incorporate more advanced training strategies and expand its capabilities to multimodal models, broadening its applicability and impact.

Concluding Thoughts

In conclusion, LlamaFactory provides a valuable contribution to the field of NLP by addressing the challenge of efficiently fine-tuning LLMs for a wide range of applications. Its design principles, focusing on efficiency and user accessibility, make it a powerful tool for both experienced researchers and newcomers alike. The framework's ability to reduce the barriers to utilizing advanced LLMs in research and practical applications marks an important step forward in the democratization of AI technology.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (88)
  1. Gradio: Hassle-free sharing and testing of ml models in the wild. arXiv preprint arXiv:1906.02569.
  2. Lightning AI. 2023. Lit-GPT.
  3. The falcon series of open language models. arXiv preprint arXiv:2311.16867.
  4. GPT4All: Training an assistant-style chatbot with large scale data distillation from GPT-3.5-turbo.
  5. Apache. 2016. Arrow.
  6. Qwen technical report. arXiv preprint arXiv:2309.16609.
  7. Open LLM leaderboard.
  8. BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1–9, Dublin, Ireland. Association for Computational Linguistics.
  9. Language models are homer simpson! safety re-alignment of fine-tuned language models through task arithmetic. arXiv preprint arXiv:2402.11746.
  10. DeepSeek LLM: Scaling open-source language models with longtermism. arXiv preprint arXiv:2401.02954.
  11. Kathi Canese and Sarah Weis. 2013. PubMed: the bibliographic database. The NCBI handbook, 2(1).
  12. Orion-14b: Open-source multilingual large language models. arXiv preprint arXiv:2401.12246.
  13. Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595.
  14. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174.
  15. LongLoRA: Efficient fine-tuning of long-context large language models. In International Conference on Learning Representations.
  16. Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177.
  17. DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models. arXiv preprint arXiv:2401.06066.
  18. Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35:16344–16359.
  19. Tim Dettmers. 2021. Bitsandbytes.
  20. GPT3.int8(): 8-bit matrix multiplication for transformers at scale. Advances in Neural Information Processing Systems, 35:30318–30332.
  21. 8-bit optimizers via block-wise quantization. In International Conference on Learning Representations.
  22. QLoRA: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36:10088–10115.
  23. LMFlow: An extensible toolkit for finetuning and inference of large foundation models. arXiv preprint arXiv:2306.12420.
  24. GLM: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, Dublin, Ireland. Association for Computational Linguistics.
  25. Extreme compression of large language models via additive quantization. arXiv preprint arXiv:2401.06118.
  26. GPTQ: Accurate post-training quantization for generative pre-trained transformers. In International Conference on Learning Representations.
  27. OLMo: Accelerating the science of language models. arXiv preprint arXiv:2402.00838.
  28. DeepSeek-Coder: When the large language model meets programming – the rise of code intelligence. arXiv preprint arXiv:2401.14196.
  29. Daniel Han and Michael Han. 2023. unsloth.
  30. LoRA+: Efficient low rank adaptation of large models. arXiv preprint arXiv:2402.12354.
  31. Measuring massive multitask language understanding. In International Conference on Learning Representations.
  32. Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790–2799. PMLR.
  33. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  34. C-Eval: A multi-level multi-discipline chinese evaluation suite for foundation models. Advances in Neural Information Processing Systems, 36.
  35. Mistral 7b. arXiv preprint arXiv:2310.06825.
  36. ReasoningLM: Enabling structural subgraph reasoning in pre-trained language models for question answering over knowledge graph. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3721–3735, Singapore. Association for Computational Linguistics.
  37. ParroT: Translating during chat using large language models tuned with human translation and feedback. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15009–15020, Singapore. Association for Computational Linguistics.
  38. Instruct and extract: Instruction tuning for on-demand information extraction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10030–10051, Singapore. Association for Computational Linguistics.
  39. Damjan Kalajdzievski. 2023. A rank stabilization scaling factor for fine-tuning with LoRA. arXiv preprint arXiv:2312.03732.
  40. SOLAR 10.7B: Scaling large language models with simple yet effective depth up-scaling. arXiv preprint arXiv:2312.15166.
  41. Efficient sequence packing without cross-contamination: Accelerating large language models without impacting performance. arXiv preprint arXiv:2107.02027.
  42. Efficient memory management for large language model serving with PagedAttention. In Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626.
  43. BLOOM: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
  44. Datasets: A community library for natural language processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 175–184.
  45. CMMLU: Measuring massive multitask language understanding in chinese. arXiv preprint arXiv:2306.09212.
  46. Colossal-AI: A unified deep learning system for large-scale parallel training. In Proceedings of the 52nd International Conference on Parallel Processing, pages 766–775.
  47. Textbooks are all you need ii: phi-1.5 technical report. arXiv preprint arXiv:2309.05463.
  48. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  49. AWQ: Activation-aware weight quantization for llm compression and acceleration. arXiv preprint arXiv:2306.00978.
  50. DoRA: Weight-decomposed low-rank adaptation. arXiv preprint arXiv:2402.09353.
  51. StarCoder 2 and The Stack v2: The next generation. arXiv preprint arXiv:2402.19173.
  52. PEFT: State-of-the-art parameter-efficient fine-tuning methods.
  53. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295.
  54. Mixed precision training. In International Conference on Learning Representations.
  55. Abstractive text summarization using sequence-to-sequence rnns and beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pages 280–290.
  56. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807, Brussels, Belgium. Association for Computational Linguistics.
  57. Crossing linguistic horizons: Finetuning and comprehensive evaluation of vietnamese large language models.
  58. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  59. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  60. PyTorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  61. Improving language understanding by generative pre-training. OpenAI blog.
  62. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 37.
  63. DeepSpeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3505–3506.
  64. Long and diverse text generation with planning-based hierarchical variational model. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3257–3268, Hong Kong, China. Association for Computational Linguistics.
  65. DeepSeekMath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300.
  66. Stanford alpaca: An instruction-following llama model.
  67. InternLM Team. 2023. InternLM: A multilingual language model with progressively enhanced capabilities.
  68. Triton: an intermediate language and compiler for tiled neural network computations. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pages 10–19.
  69. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  70. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  71. Zephyr: Direct distillation of LM alignment. arXiv preprint arXiv:2310.16944.
  72. TRL: Transformer reinforcement learning.
  73. Esrl: Efficient sampling-based reinforcement learning for sequence generation. arXiv preprint arXiv:2308.02223.
  74. OpenChat: Advancing open-source language models with mixed-quality data. arXiv preprint arXiv:2309.11235.
  75. Document-level machine translation with large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16646–16661, Singapore. Association for Computational Linguistics.
  76. How far can camels go? exploring the state of instruction tuning on open resources. Advances in Neural Information Processing Systems, 36.
  77. Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
  78. Skywork: A more open bilingual foundation model. arXiv preprint arXiv:2310.19341.
  79. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  80. YUAN 2.0: A large language model with localized filtering-based attention. arXiv preprint arXiv:2311.15786.
  81. Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305.
  82. ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations.
  83. Yi: Open foundation models by 01.ai. arXiv preprint arXiv:2403.04652.
  84. Open, closed, or small language models for text classification? arXiv preprint arXiv:2308.10092.
  85. LLaMA-Adapter: Efficient fine-tuning of language models with zero-init attention. In International Conference on Learning Representations.
  86. GaLore: Memory-efficient llm training by gradient low-rank projection. arXiv preprint arXiv:2403.03507.
  87. A survey of large language models. arXiv preprint arXiv:2303.18223.
  88. Judging LLM-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yaowei Zheng (8 papers)
  2. Richong Zhang (47 papers)
  3. Junhao Zhang (24 papers)
  4. Yanhan Ye (2 papers)
  5. Zheyan Luo (2 papers)
  6. Yongqiang Ma (12 papers)
  7. Zhangchi Feng (6 papers)
Citations (152)
Youtube Logo Streamline Icon: https://streamlinehq.com