Model Editing with Canonical Examples (2402.06155v1)

Published 9 Feb 2024 in cs.CL

Abstract: We introduce model editing with canonical examples, a setting in which (1) a single learning example is provided per desired behavior, (2) evaluation is performed exclusively out-of-distribution, and (3) deviation from an initial model is strictly limited. A canonical example is a simple instance of good behavior, e.g., The capital of Mauritius is Port Louis) or bad behavior, e.g., An aspect of researchers is coldhearted). The evaluation set contains more complex examples of each behavior (like a paragraph in which the capital of Mauritius is called for.) We create three datasets and modify three more for model editing with canonical examples, covering knowledge-intensive improvements, social bias mitigation, and syntactic edge cases. In our experiments on Pythia LLMs, we find that LoRA outperforms full finetuning and MEMIT. We then turn to the Backpack LLM architecture because it is intended to enable targeted improvement. The Backpack defines a large bank of sense vectors--a decomposition of the different uses of each word--which are weighted and summed to form the output logits of the model. We propose sense finetuning, which selects and finetunes a few ($\approx$ 10) sense vectors for each canonical example, and find that it outperforms other finetuning methods, e.g., 4.8% improvement vs 0.3%. Finally, we improve GPT-J-6B by an inference-time ensemble with just the changes from sense finetuning of a 35x smaller Backpack, in one setting outperforming editing GPT-J itself (4.1% vs 1.0%).

Citations (3)

View on Semantic Scholar

Summary

The paper’s main contribution is presenting a model editing framework that uses minimal canonical examples to correct language model flaws while preserving overall behavior.
It employs a sense finetuning method that adjusts select sense vectors, demonstrating superior performance compared to full finetuning and MEMIT.
Empirical evaluations on Pythia and GPT-J-6B models validate the approach, highlighting its potential for enhancing model reliability through targeted corrections.

Enhancing LLMs Through Canonical Examples and Sense Vectors

Model Editing with Canonical Examples

LLMs have revolutionized natural language processing, offering capabilities ranging from simple text generation to complex reasoning tasks. However, these models are not without their flaws, including harboring social biases, propagating incorrect information, and struggling with edge cases in syntax. Addressing these issues traditionally involves either retraining the entire model—a computationally expensive task—or applying patches that can lead to unintended consequences. This discourse explores an alternative approach: model editing with canonical examples, a method crafted to refine models by learning from minimalist, significant examples while strictly limiting deviations from the original model behavior.

Canonical Examples and Their Significance

The concept revolves around utilizing "canonical examples" - singular instances exemplifying desired or undesired behaviors. These examples serve as a basis for model refinement, focusing on enhancing model performance in handling complex tasks derived from these examples and curbing the model's deviation from its initial state. This methodology aims to ensure that the model retains its broad capabilities while correcting specific issues.

Canonical examples are coupled with a loss function indicating the modification's preferential direction, and success in this regime is measured by the model's performance on an evaluation set distinct from the training examples. This setup mandates that the models generalize from these canonical instances to more intricate scenarios without significant alterations to their original training.

Empirical Evaluations and Findings

Extensive experiments have been conducted using Pythia LLMs to evaluate the efficacy of canonical examples in model editing. Among the finetuning algorithms tested, LoRA (Low-Rank Adaptation) demonstrated superior performance over full finetuning as well as MEMIT, a dedicated model editing technique. These experiments also sparked the development of "sense finetuning" under the Backpack LLM architecture, which yielded further advancements. Sense finetuning zeroes in on finetuning a select few (~10) sense vectors for each canonical example, surpassing other methods significantly in performance metrics.

Applying Sense Finetuning Enhancements to Larger Models

A noteworthy extension of our work includes leveraging sense finetuning improvements achieved on smaller Backpack models to enhance much larger pre-existing models, such as GPT-J-6B. This is done by using an inference-time ensemble that combines the logits from a pretrained and sense-finetuned Backpack model, thus imbuing the larger model with the modifications without direct alteration. In stringent evaluation settings, this ensemble approach proved to outperform direct finetuning methods on GPT-J itself, underscoring the potential of smaller, adaptable models to correct larger monoliths.

Theoretical Implications and Future Directions

The paper has several theoretical implications for model architecture and the pursuit of model editability. The success of sense finetuning underscores the utility of incorporating architectural features that facilitate targeted improvement post hoc. This suggests a fruitful direction for future research: designing models not just for performance but also for their amenability to precise, post-training corrections.

Conclusion

Model editing with canonical examples emerges as a promising methodology for rectifying specific deficiencies in LLMs without necessitating comprehensive retraining. By focusing on minimal yet representative examples and employing techniques like sense finetuning, it is possible to achieve targeted improvements while preserving the model's original integrity. This approach not only enhances the model's functionality but also furnishes a blueprint for constructing models that are inherently more adaptable and correctable, paving the way for the next generation of more reliable and robust LLMs.

PDF Markdown

Related Papers

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions (2023)
Editing Common Sense in Transformers (2023)
Memory-Based Model Editing at Scale (2022)
Backpack Language Models (2023)
Rank-One Editing of Encoder-Decoder Models (2022)

Tweets

https://twitter.com/_akhaliq/status/1756875318768320625

https://twitter.com/agi_news0/status/1757018572952625521