Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Needle in the Haystack for Memory Based Large Language Models (2407.01437v2)

Published 1 Jul 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Current LLMs often perform poorly on simple fact retrieval tasks. Here we investigate if coupling a dynamically adaptable external memory to a LLM can alleviate this problem. For this purpose, we test Larimar, a recently proposed LLM architecture which uses an external associative memory, on long-context recall tasks including passkey and needle-in-the-haystack tests. We demonstrate that the external memory of Larimar, which allows fast write and read of an episode of text samples, can be used at test time to handle contexts much longer than those seen during training. We further show that the latent readouts from the memory (to which long contexts are written) control the decoder towards generating correct outputs, with the memory stored off of the GPU. Compared to existing transformer-based LLM architectures for long-context recall tasks that use larger parameter counts or modified attention mechanisms, a relatively smaller size Larimar is able to maintain strong performance without any task-specific training or training on longer contexts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Subhajit Chaudhury (40 papers)
  2. Soham Dan (41 papers)
  3. Payel Das (104 papers)
  4. Georgios Kollias (17 papers)
  5. Elliot Nelson (15 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com