Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective (2406.13282v3)

Published 19 Jun 2024 in cs.CL

Abstract: Enabling LLMs to handle lengthy context is currently a research hotspot. Most LLMs are built upon rotary position embedding (RoPE), a popular position encoding method. Therefore, a prominent path is to extrapolate the RoPE trained on comparably short texts to far longer texts. A heavy bunch of efforts have been dedicated to boosting the extrapolation via extending the formulations of the RoPE, however, few of them have attempted to showcase their inner workings comprehensively. In this paper, we are driven to offer a straightforward yet in-depth understanding of RoPE extensions from an attention perspective and on two benchmarking tasks. A broad array of experiments reveals several valuable findings: 1) Maintaining attention patterns to those at the pretrained length improves extrapolation; 2) Large attention uncertainty leads to retrieval errors; 3) Using longer continual pretraining lengths for RoPE extensions could reduce attention uncertainty and significantly enhance extrapolation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Meizhi Zhong (6 papers)
  2. Chen Zhang (403 papers)
  3. Yikun Lei (5 papers)
  4. Xikai Liu (3 papers)
  5. Yan Gao (157 papers)
  6. Yao Hu (106 papers)
  7. Kehai Chen (59 papers)
  8. Min Zhang (630 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Youtube Logo Streamline Icon: https://streamlinehq.com