Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens (2406.10985v1)
Abstract: LLMs have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based LLMs suffer a performance degradation when modeling long-term contexts due to they discard some information to reduce computational overhead. In this work, we propose a simple yet effective method to enable LLMs to take a deep breath, encouraging them to summarize information contained within discrete text chunks. Specifically, we segment the text into multiple chunks and insert special token <SR> at the end of each chunk. We then modify the attention mask to integrate the chunk's information into the corresponding <SR> token. This facilitates LLMs to interpret information not only from historical individual tokens but also from the <SR> token, aggregating the chunk's semantic information. Experiments on LLMing and out-of-domain downstream tasks validate the superiority of our approach.
- Weiyao Luo (4 papers)
- Suncong Zheng (10 papers)
- Heming Xia (22 papers)
- Weikang Wang (14 papers)
- Yan Lei (8 papers)
- Tianyu Liu (177 papers)
- Shuang Chen (46 papers)
- Zhifang Sui (89 papers)