Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Extending Llama-3's Context Ten-Fold Overnight (2404.19553v1)

Published 30 Apr 2024 in cs.CL
Extending Llama-3's Context Ten-Fold Overnight

Abstract: We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original capability over short contexts. The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4 , which indicates the LLMs' inherent (yet largely underestimated) potential to extend its original context length. In fact, the context length could be extended far beyond 80K with more computation resources. Therefore, the team will publicly release the entire resources (including data, model, data generation pipeline, training code) so as to facilitate the future research from the community: \url{https://github.com/FlagOpen/FlagEmbedding}.

Extending Context Length in LLMs with QLoRA: Efficient Training and Impressive Outcomes

Introduction to Context Extension

Recent advancements in LLMs have increasingly focused on enhancing their ability to handle long contexts, which is essential for tasks involving complex understanding and data integration across extensive content. This exploration often comes with the challenge of requiring considerable computational resources and intricate data handling strategies. However, the innovative use of GPT-4 for data generation in this paper offers a remarkably efficient and effective pathway to improve the context length capabilities of LLMs from 8K to 80K tokens.

Efficient Training Strategy

The pivotal strategy in this research involves using a variety of synthetic datasets for training, notably deploying GPT-4 to generate 3.5K synthetic training examples distributed across three different types of tasks:

  1. Single-Detail QA: Focused on generating questions about specific details within a short excerpt of a longer text.
  2. Multi-Detail QA: Devised to test the model's capability to synthesize and reason information from multiple points within a text.
  3. Biography Summarization: Aims at summarizing biographical details of characters from books, assessing the model's summarization abilities in extensive contexts.

These tasks are significant as they directly relate to the everyday challenges faced in processing large documents and deriving coherent, context-aware outputs from them.

Key Contributions and Model Performance

  • Model Accessibility: The team has made significant strides not just in modifying the Llama-3-8B-Instruct model to handle longer texts (up to 80K tokens), but also in ensuring that these advancements are accessible. All resources, including training data and the model itself, are made available to the community.
  • Training Efficiency: Remarkably, the entire training process only took 8 hours on a specific GPU setup, showcasing the model's efficiency.

Experimental Insights

Various tests were conducted to evaluate the model's performance, including:

  • Needle-In-A-Haystack
  • Topic Retrieval
  • LongBench benchmarks
  • InfBench for long-context questions and summarization tasks

The model not only outperforms its predecessors across many benchmarks but also demonstrates robust generalization capabilities beyond the set training contexts, up to 128K tokens.

Theoretical and Practical Implications

From a theoretical standpoint, these results underscore a crucial yet underappreciated aspect regarding the latent capabilities of LLMs to extend their operational context significantly with minimal data. It suggests that LLMs might process even longer sequences effectively than current standards suggest, provided efficient training methodologies are applied.

Practically, the ability to handle longer contexts without a loss in performance on standard benchmarks paves the way for LLM applications in fields requiring detailed analysis of large documents, such as legal document review, lengthy academic article summarization, and comprehensive book analysis for educational purposes.

Looking Ahead

While current results are promising, the journey to refine these models continues. Future research might explore even longer context lengths and investigate methods to further enhance the efficient training protocols used here. Additionally, integrating more varied data, particularly code, could improve performance in areas like code completion which currently lags slightly behind.

In conclusion, the extended capabilities of Llama-3-8B-Instruct-80K-QLoRA mark a significant step towards more contextually aware and efficient LLMs, promising to broaden the horizons of what's achievable with AI in processing extensive textual information.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Peitian Zhang (23 papers)
  2. Ninglu Shao (9 papers)
  3. Zheng Liu (312 papers)
  4. Shitao Xiao (38 papers)
  5. Hongjin Qian (23 papers)
  6. Qiwei Ye (16 papers)
  7. Zhicheng Dou (113 papers)
Citations (9)

HackerNews

Reddit Logo Streamline Icon: https://streamlinehq.com