Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LaMemo: Language Modeling with Look-Ahead Memory (2204.07341v2)

Published 15 Apr 2022 in cs.CL

Abstract: Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in LLMing. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current context that provides up-to-date information for token prediction. To remedy this issue, we propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens, and interpolating with the old memory states to maintain long-term information in the history. LaMemo embraces bi-directional attention and segment recurrence with an additional computation overhead only linearly proportional to the memory length. Experiments on widely used LLMing benchmarks demonstrate its superiority over the baselines equipped with different types of memory.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Haozhe Ji (11 papers)
  2. Rongsheng Zhang (36 papers)
  3. Zhenyu Yang (56 papers)
  4. Zhipeng Hu (38 papers)
  5. Minlie Huang (225 papers)
Citations (2)