Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens (2406.10985v1)

Published 16 Jun 2024 in cs.CL

Abstract: LLMs have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath, encouraging them to summarize information contained within discrete text chunks. Specifically, we segment the text into multiple chunks and insert special token <SR> at the end of each chunk. We then modify the attention mask to integrate the chunk's information into the corresponding <SR> token. This facilitates LLMs to interpret information not only from historical individual tokens but also from the <SR> token, aggregating the chunk's semantic information. Experiments on LLMing and out-of-domain downstream tasks validate the superiority of our approach.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Weiyao Luo (4 papers)
  2. Suncong Zheng (10 papers)
  3. Heming Xia (22 papers)
  4. Weikang Wang (14 papers)
  5. Yan Lei (8 papers)
  6. Tianyu Liu (177 papers)
  7. Shuang Chen (46 papers)
  8. Zhifang Sui (89 papers)