Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mass-Editing Memory in a Transformer (2210.07229v2)

Published 13 Oct 2022 in cs.CL and cs.LG

Abstract: Recent work has shown exciting promise in updating LLMs with new memories, so as to replace obsolete information or add specialized knowledge. However, this line of work is predominantly limited to updating single associations. We develop MEMIT, a method for directly updating a LLM with many memories, demonstrating experimentally that it can scale up to thousands of associations for GPT-J (6B) and GPT-NeoX (20B), exceeding prior work by orders of magnitude. Our code and data are at https://memit.baulab.info.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Kevin Meng (6 papers)
  2. Arnab Sen Sharma (13 papers)
  3. Alex Andonian (16 papers)
  4. Yonatan Belinkov (111 papers)
  5. David Bau (62 papers)
Citations (422)

Summary

Mass-Editing Memory in a Transformer: A Methodological Advancement in LLM Knowledge Updates

Introduction

LLMs (LMs) have increasingly become a fundamental part of the modern AI landscape, especially with the advent of large autoregressive models such as GPT-N series that demonstrate remarkable abilities in generating human-like text based on a given prompt. A significant utility of these models lies in their capacity to function as dynamic knowledge bases, albeit with a critical shortcoming: their static nature post-training leaves them with eventually outdated or incomplete information. Addressing this issue, the paper introduces MEMIT (Mass-Editing Memory In Transformer), a novel methodology that enables the mass updating of factual knowledge directly into the parameters of transformer-based LMs.

Background

The paper situates MEMIT within the continuum of efforts to embed fresh and specific knowledge into pre-trained models without the need for extensive retraining. This endeavor responds to the necessity for models to stay current with evolving facts and domain-specific knowledge, which is particularly relevant for applications in question answering, content generation, and knowledge search domains. Prior approaches, while innovative, demonstrated limited scalability and efficacy, predominantly facilitating updates of a comparatively small number of associations.

MEMIT: Core Contributions

MEMIT distinguishes itself by scaling the direct editing of memories within transformers by several orders of magnitude beyond what previous methodologies have achieved. At its core, MEMIT is inspired by direct editing techniques but advances this concept by employing calculated parameter updates across multiple transformer layers. The methodology effectively inserts new factual associations, with validations showing its applicability on transformer models of substantial sizes, specifically GPT-J (6B parameters) and GPT-NeoX (20B parameters).

Scalability and Performance

Experiments conducted demonstrate MEMIT's unparalleled scalability, being able to update thousands of factual memories with sustained efficacy, specificity, and fluency. For instance, tests on the zsRE benchmark revealed a comprehensive improvement where MEMIT outperformed both naive fine-tuning and other existing memory editing approaches, presenting a significant advancement in both the extent of memory updates and preservation of model performance.

Theoretical and Practical Implications

The introduction of MEMIT presents several critical implications for the field of AI and LLMing. Theoretically, it pushes the frontier on understanding how knowledge is stored and can be manipulated within transformer networks. Practically, it opens up new pathways for maintaining the relevancy of LMs in rapidly changing information landscapes, offering a cost-effective alternative to retraining. MEMIT's proficiency in mass-updating model knowledge with minimal degradation in performance posits it as a strategically valuable tool for customizing LMs for specific domains or rapidly integrating new factual information.

Forward Look

The paper speculates on the future trajectory of AI and LLMing, suggesting that interpretability-based methodologies akin to MEMIT could revolutionize how models are updated, controlled, and audited. This direction not only aligns with the practical needs of AI applications but also contributes to the ongoing dialogue on model transparency and ethical AI development.

Concluding Remarks

MEMIT sets a new benchmark for direct memory editing in LLMs, providing a scalable solution to one of the most pressing challenges in the utility of LMs as knowledge bases. By demonstrating the feasibility of mass memory updates, MEMIT not only heralds a significant technological leap but also accentuates the importance of adaptive models in the evolving landscape of AI and machine learning.

Youtube Logo Streamline Icon: https://streamlinehq.com