Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comprehensive Survey on Long Context Language Modeling (2503.17407v1)

Published 20 Mar 2025 in cs.CL and cs.LG

Abstract: Efficient processing of long contexts has been a persistent pursuit in Natural Language Processing. With the growing number of long documents, dialogues, and other textual data, it is important to develop Long Context LLMs (LCLMs) that can process and analyze extensive inputs in an effective and efficient way. In this paper, we present a comprehensive survey on recent advances in long-context modeling for LLMs. Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively. For the first aspect, we discuss data strategies, architectural designs, and workflow approaches oriented with long context processing. For the second aspect, we provide a detailed examination of the infrastructure required for LCLM training and inference. For the third aspect, we present evaluation paradigms for long-context comprehension and long-form generation, as well as behavioral analysis and mechanism interpretability of LCLMs. Beyond these three key aspects, we thoroughly explore the diverse application scenarios where existing LCLMs have been deployed and outline promising future development directions. This survey provides an up-to-date review of the literature on long-context LLMs, which we wish to serve as a valuable resource for both researchers and engineers. An associated GitHub repository collecting the latest papers and repos is available at: \href{https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling}{\color[RGB]{175,36,67}{LCLM-Horizon}}.

This paper, "A Comprehensive Survey on Long Context LLMing" (Liu et al., 20 Mar 2025 ), presents a thorough overview of the rapidly evolving field of Long Context LLMs (LCLMs). It acknowledges the historical challenge of processing long texts and highlights how recent LCLMs, capable of handling context windows from 128K up to 10M tokens, are revolutionizing AI by enabling tasks like long reasoning, complex agent workflows, enhanced in-context learning, efficient information retrieval, and advanced multimodal intelligence.

The survey structures its comprehensive review around three key research questions (RQs):

  1. RQ1: How to obtain effective and efficient LCLMs?
  2. RQ2: How to train and deploy LCLMs efficiently?
  3. RQ3: How to evaluate and analyze LCLMs comprehensively?

Obtaining Effective and Efficient LCLMs (RQ1)

To address RQ1, the survey explores three main areas: data strategies, architectural designs, and workflow approaches.

Efficient Training and Deployment (RQ2)

Comprehensive Evaluation and Analysis (RQ3)

  • Evaluation (§6): Divides capabilities into Long Context Comprehension and Long-Form Generation.
    • Comprehension: Paradigms include LLMing (PPL trends), Retrieval (explicit/semantic, NIAH tasks), Aggregation (statistical/semantic), Reasoning (parallel/iterative), and Real-World Adaptation (QA, Summarization, Reranking, RAG, ICL, Code tasks). Various synthetic (Table 4) and real-world (Table 5) benchmarks like RULER (Fu et al., 2 Feb 2024 ), LongBench (Bai et al., 2023 ), LOFT (Lee et al., 19 Jun 2024 ), etc., are summarized.
    • Generation: Focuses on generating long, coherent text. Benchmarks (Table 6) like ELI5 (Fan et al., 2019 ), LongWriter (Bai et al., 13 Aug 2024 ), HelloBench (Que et al., 24 Sep 2024 ) are discussed, along with data sources (web, user, synthetic, crowdsourced, PADs) and evaluation methods (automatic metrics like ROUGE/BLEU, human evaluation, LLM-as-a-Judge).
  • Analysis (§7): Examines LCLMs externally and internally.
    • Performance Analysis: Discusses the gap between claimed and effective context length ("Lost in the Middle" (He et al., 2023 )), the relevance of long context PPL (potentially weak unless refined like LongPPL (Fang et al., 31 Oct 2024 )), and the interplay between RAG and LCLMs (often complementary, e.g., LongRAG (Jiang et al., 21 Jun 2024 )).
    • Model Structure Analysis: Investigates Positional Embeddings (RoPE extrapolation mechanisms), Attention/MLP modules (identifying specialized heads like retrieval heads (Tang et al., 22 Jul 2024 ), analyzing softmax limitations and attention sinks (Xiao et al., 2023 )), and Layer Interaction (benefits of hybrid layer structures).

Applications (§8)

The survey highlights the broad applicability of LCLMs in:

  • Agents: Handling long interaction histories and complex observations (e.g., GUI agents, software engineering agents).
  • RAG: Processing larger chunks and enabling more complex retrieval strategies (e.g., Perplexity.ai, Deepsearch).
  • Chatbots: Maintaining long-term memory and coherence (e.g., ChatGPT Memory, Character.ai).
  • Code: Repository-level understanding and generation (e.g., GitHub Copilot, StarCoder2 (Lozhkov et al., 29 Feb 2024 )).
  • Traditional NLP: Enhancing tasks like document summarization, long-text embedding (e.g., BGE-M3 (Chen et al., 5 Feb 2024 )), and document-level machine translation.
  • Multimodal Tasks: Understanding long videos, image sequences (e.g., Gemini 1.5 (Reid et al., 8 Mar 2024 ), Qwen2.5-VL (Wang et al., 18 Sep 2024 )).
  • Specific Domains: Medicine (MedOdyssey (Fan et al., 21 Jun 2024 )), finance (LongFin (Masry et al., 26 Jan 2024 )), biology (MegaDNA (Liu et al., 2 Mar 2024 )).

Future Directions (§9)

Promising future research areas include:

  1. Developing LCLMs for complex, o1-like long reasoning.
  2. Further extending context windows and improving modeling capabilities within existing windows (via RL, better data recipes, distillation, architecture).
  3. Designing more efficient architectures and training/deployment infrastructure (e.g., linear attention, customized hardware).
  4. Creating more reliable evaluation frameworks, especially for long-form generation and real-world/domain-specific comprehension.
  5. Advancing mechanistic interpretability to understand and improve LCLM internals related to long context processing.

In conclusion, this survey provides a detailed and structured examination of the current landscape of long context LLMing, covering data, architectures, workflows, infrastructure, evaluation, analysis, applications, and future challenges, serving as a valuable resource for the research and engineering community.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (37)
  1. Jiaheng Liu (100 papers)
  2. Dawei Zhu (46 papers)
  3. Zhiqi Bai (5 papers)
  4. Yancheng He (30 papers)
  5. Huanxuan Liao (12 papers)
  6. Haoran Que (10 papers)
  7. Zekun Wang (50 papers)
  8. Chenchen Zhang (19 papers)
  9. Ge Zhang (170 papers)
  10. Jiebin Zhang (4 papers)
  11. Yuanxing Zhang (30 papers)
  12. Zhuo Chen (319 papers)
  13. Hangyu Guo (14 papers)
  14. Shilong Li (25 papers)
  15. Ziqiang Liu (16 papers)
  16. Yong Shan (7 papers)
  17. Yifan Song (49 papers)
  18. Jiayi Tian (7 papers)
  19. Wenhao Wu (71 papers)
  20. Zhejian Zhou (6 papers)