Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Larimar: Large Language Models with Episodic Memory Control (2403.11901v1)

Published 18 Mar 2024 in cs.LG and cs.AI
Larimar: Large Language Models with Episodic Memory Control

Abstract: Efficient and accurate updating of knowledge stored in LLMs is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed - yielding speed-ups of 4-10x depending on the base LLM - as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting and input context length generalization with Larimar and show their effectiveness.

Enhancing LLMs with Episodic Memory for Dynamic Knowledge Updates

Introduction

In the rapidly evolving landscape of artificial intelligence research, the augmentation of LLMs with external memory models proposes a promising avenue for addressing the challenges of knowledge dynamism. The paper introduces Larimar, a novel architecture that embeds a distributed episodic memory within LLMs to facilitate on-the-fly knowledge updating. This breakthrough architecture promises significant improvements in speed, flexibility, and scalability over existing models.

Model Architecture

Larimar is premised on the understanding that traditional LLMs, while powerful, are constrained by static knowledge bases that are not readily updatable without extensive re-training. The proposed architecture counters this limitation by integrating an episodic memory modeled on the human hippocampus, which is renowned for its rapid learning capabilities. Larimar's architecture features:

  • An encoder for converting data inputs into latent vectors.
  • A distributed associative memory for storing and dynamically updating these vectors.
  • A decoder that leverages both the static knowledge embedded in the LLM parameters and the dynamic updates stored in the memory.

The architecture supports efficient one-shot learning, enabling immediate memory updates without gradient descent, thus accelerating the updating process significantly.

Memory Operations

The memory model supports basic operations akin to write, read, and generate, enabling dynamic updates, retrieval, and use of stored knowledge to influence model outputs. Furthermore, sequential writing and forgetting operations are detailed, showcasing Larimar's ability to modify its memory contents accurately in response to evolving information needs.

Experimental Results

Empirical evaluations demonstrate Larimar's capability to perform knowledge editing tasks with speed-ups ranging from 4-10x compared to leading baselines while maintaining competitive accuracy. The architecture's flexibility is further evidenced through applications in sequential fact editing and selective fact forgetting. Notably, Larimar exhibits resilience in retaining its performance even as memory demands scale, showcasing its potential for practical, real-world applications where knowledge bases are continually updated.

Speculations on Future Developments in AI

The introduction of episodic memory into LLMs as explored by Larimar opens up exciting prospects for the future of AI. It is conceivable that as techniques for dynamic memory management and integration with LLMs evolve, we could witness the emergence of models that not only adapt to new information more swiftly but do so with an enhanced understanding of context and temporality. This could pave the way for AI systems capable of more nuanced and human-like reasoning and interaction.

Conclusion

Larimar represents a significant step forward in the effort to create more dynamic and adaptable LLMs. By successfully integrating an episodic memory that enables real-time knowledge updates, Larimar addresses a critical pain point in the use of LLMs, particularly in applications requiring up-to-date information. As future work builds on and refines this approach, the goal of developing AI systems with the ability to learn and forget as efficiently as humans do appears increasingly attainable.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  2. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  4. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  5. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, 2019.
  6. Exploring length generalization in large language models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=zSkYVeX7bC4.
  7. The impact of positional encoding on length generalization in transformers, 2023.
  8. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
  9. Modifying memories in transformer models, 2020.
  10. Large language models with controllable working memory, 2022.
  11. Lost in the middle: How language models use long contexts, 2023.
  12. Can sensitive information be deleted from llms? objectives for defending against extraction attacks, 2023.
  13. Regional synapse gain and loss accompany memory formation in larval zebrafish. Proceedings of the National Academy of Sciences, 119(3):e2107661119, 2022.
  14. Adaptive expression of engrams by retroactive interference. bioRxiv, pages 2023–03, 2023.
  15. Model-free episodic control, 2016.
  16. Hippocampal contributions to control: the third way. Advances in neural information processing systems, 20, 2007.
  17. What learning systems do intelligent agents need? complementary learning systems theory updated. Trends in cognitive sciences, 20(7):512–534, 2016.
  18. Organizing memories for generalization in complementary learning systems. Nature neuroscience, 26(8):1438–1448, 2023.
  19. Creating a false memory in the hippocampus. Science, 341(6144):387–391, 2013.
  20. The kanerva machine: A generative distributed memory, 2018a.
  21. Generative pseudo-inverse memory. In International Conference on Learning Representations, 2021.
  22. Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372, 2022a.
  23. Aging with grace: Lifelong model editing with discrete key-value adaptors. arXiv preprint arXiv:2211.11031, 2022.
  24. Easyedit: An easy-to-use knowledge editing framework for large language models, 2023a.
  25. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  26. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  27. Pointer sentinel mixture models, 2016.
  28. Mass-editing memory in a transformer. In The Eleventh International Conference on Learning Representations, 2023.
  29. Memory-based model editing at scale. In International Conference On Machine Learning, Vol 162. JMLR-JOURNAL MACHINE LEARNING RESEARCH, 2022.
  30. Editing large language models: Problems, methods, and opportunities. arXiv preprint arXiv:2305.13172, 2023.
  31. Can we edit factual knowledge by in-context learning?, 2023.
  32. Zero-shot relation extraction via reading comprehension. arXiv preprint arXiv:1706.04115, 2017.
  33. CNN. 2023 in review fast facts, 2023. URL: \https://www.cnn.com/2023/11/13/us/2023-in-review-fast-facts/index.html.
  34. Memory hierarchies map onto the hippocampal long axis in humans. Nature neuroscience, 18(11):1562–1564, 2015.
  35. Supersizing transformers: Going beyond rag with extended minds for llms. The Normal Blog, 2023. URL: https://blog.normalcomputing.ai/posts/2023-09-12-supersizing-transformers/supersizing-transformers.html.
  36. In-context learning and induction heads, 2022.
  37. Efficiently modeling long sequences with structured state spaces, 2022.
  38. Mamba: Linear-time sequence modeling with selective state spaces, 2023.
  39. Repeat after me: Transformers are better than state space models at copying, 2024.
  40. Memory networks. arXiv preprint arXiv:1410.3916, 2014.
  41. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
  42. Key-value memory networks for directly reading documents, 2016.
  43. Pentti Kanerva. Sparse distributed memory. MIT press, 1988.
  44. Learning attractor dynamics for generative memory, 2018b.
  45. Kanerva++: extending the kanerva machine with differentiable, locally block allocated latent memory, 2022.
  46. Variational memory addressing in generative models, 2017.
  47. Addressing some limitations of transformers with feedback memory, 2021.
  48. Unbounded cache model for online language modeling with open vocabulary, 2017.
  49. Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172, 2019.
  50. A comprehensive study of knowledge editing for large language models. arXiv preprint arXiv:2401.01286, 2024.
  51. Knowledge editing for large language models: A survey, 2023b.
  52. Transformer-patcher: One mistake worth one neuron. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=4oYUGeGBPm.
  53. Fast model editing at scale. arXiv preprint arXiv:2110.11309, 2021.
  54. Editing factual knowledge in language models. arXiv preprint arXiv:2104.08164, 2021.
  55. Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229, 2022b.
  56. Does localization inform editing? surprising differences in causality-based localization vs. knowledge editing in language models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=EldbUlZtbd.
  57. Model editing at scale leads to gradual and catastrophic forgetting. arXiv preprint arXiv:2401.07453, 2024.
  58. Unveiling the pitfalls of knowledge editing for large language models, 2023.
  59. Model editing can hurt general abilities of large language models, 2024.
  60. Knowledge sanitization of large language models, 2023.
  61. Improving sequential model editing with fact retrieval. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11209–11224, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.749. URL https://aclanthology.org/2023.findings-emnlp.749.
  62. Unlearning bias in language models by partitioning gradients. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6032–6048, 2023.
  63. Who’s harry potter? approximate unlearning in llms, 2023.
  64. Unlearn what you want to forget: Efficient unlearning for llms. arXiv preprint arXiv:2310.20150, 2023.
  65. In-context unlearning: Language models as few shot unlearners. arXiv preprint arXiv:2310.07579, 2023.
  66. Knowledge neurons in pretrained transformers. arXiv preprint arXiv:2104.08696, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Payel Das (104 papers)
  2. Subhajit Chaudhury (40 papers)
  3. Elliot Nelson (15 papers)
  4. Igor Melnyk (28 papers)
  5. Sarath Swaminathan (3 papers)
  6. Sihui Dai (12 papers)
  7. Aurélie Lozano (20 papers)
  8. Georgios Kollias (17 papers)
  9. Vijil Chenthamarakshan (36 papers)
  10. Jiří (1 paper)
  11. Navrátil (1 paper)
  12. Soham Dan (41 papers)
  13. Pin-Yu Chen (311 papers)
Citations (10)