Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Parallel Context Windows for Large Language Models (2212.10947v3)

Published 21 Dec 2022 in cs.CL

Abstract: When applied to processing long text, LLMs are limited by their context window. Existing efforts to address this limitation involve training specialized architectures, and cannot be easily applied to off-the-shelf LLMs. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows''), restrict the attention mechanism to apply only within each window, and re-use the positional embeddings across the windows. Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents. Our results highlight Parallel Context Windows as a promising method for applying off-the-shelf LLMs in a range of settings that require long text sequences. We make our code publicly available at https://github.com/ai21labs/parallel-context-windows.

Parallel Context Windows for LLMs

The paper presents a novel approach titled Parallel Context Windows (PCW), aimed at enhancing the capabilities of LLMs to process extended text sequences beyond their inherent context window limits, without requiring additional model training. This method is applicable to any off-the-shelf LLM and leverages the reuse of positional embeddings via multiple "windows," allowing segmented processing of long texts.

Methodology

PCW addresses the context window limitations intrinsic to LLMs, which generally incur quadratic computational costs as context sizes increase. The method involves decomposing extended texts into several segments or windows, each of which operates independently within the model's attention mechanism. The key innovations include:

  1. Context Segmentation: Long input texts are divided into fixed-size windows, where each window mimics the original context capacity of the LLM.
  2. Attention Mask Modification: Tokens within a window attend only to tokens within the same window, ensuring computational efficiency even as the number of windows increases.
  3. Reuse of Positional Embeddings: By reutilizing positional embeddings, PCW ensures each window is perceived as a separate entity, circumventing issues related to positional context extrapolation.

Experimental Results

The PCW approach was tested across various in-context learning tasks using LLMs of varying sizes, from 750 million to 178 billion parameters. The experiments demonstrate significant performance enhancements in several applications:

  • Multi-Class Classification: PCW demonstrated notable improvements, particularly in tasks with numerous classification categories, achieving average performance gains of up to 8.7 percentage points for the largest models tested.
  • Information Extraction: For extractive question answering tasks, the method effectively leveraged extended context, showing enhanced F1 scores across standard benchmarks.
  • Retrieval-Augmented Generation: PCW showed potential benefits in aggregating information from multiple retrieved documents, facilitating better question-answering performance.

Implications and Future Directions

The findings suggest that PCW allows broader application of LLMs to scenarios requiring long sequences of text input, which would otherwise be computationally prohibitive with standard LLM configurations. This approach has practical implications for tasks such as document summarization, large-scale information retrieval, and extensive dialogue management.

In future work, exploring extensions of PCW through further model-specific adaptations or training could yield even better performance and broader applicability. Moreover, integrating PCW with existing in-context learning optimization techniques could present new opportunities for enhancing algorithmic efficiency and effectiveness in natural language processing tasks.

In summary, Parallel Context Windows represent a viable and impactful method for extending the functional utility of LLMs, facilitating improved processing of extended textual data without necessitating complex model retraining.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Nir Ratner (5 papers)
  2. Yoav Levine (24 papers)
  3. Yonatan Belinkov (111 papers)
  4. Ori Ram (14 papers)
  5. Inbal Magar (5 papers)
  6. Omri Abend (75 papers)
  7. Ehud Karpas (6 papers)
  8. Amnon Shashua (44 papers)
  9. Kevin Leyton-Brown (57 papers)
  10. Yoav Shoham (22 papers)
Citations (57)
Github Logo Streamline Icon: https://streamlinehq.com