Efficiently Quantifying and Mitigating Ripple Effects in Model Editing (2403.07825v3)
Abstract: LLMs have revolutionized numerous tasks with their remarkable efficacy. However, editing these models, crucial for rectifying outdated or erroneous information, often leads to a complex issue known as the ripple effect in the hidden space. While difficult to detect, this effect can significantly impede the efficacy of model editing tasks and deteriorate model performance. This paper addresses this scientific challenge by proposing a novel evaluation methodology, Graphical Impact Evaluation(GIE), which quantitatively evaluates the adaptations of the model and the subsequent impact of editing. Furthermore, we introduce the Selective Impact Revision(SIR), a model editing method designed to mitigate this ripple effect. Our comprehensive evaluations reveal that the ripple effect in the hidden space is a significant issue in all current model editing methods. However, our proposed methods, GIE and SIR, effectively identify and alleviate this issue, contributing to the advancement of LLM editing techniques.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Evaluating the ripple effects of knowledge editing in language models.
- Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8493–8502.
- Editing factual knowledge in language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6491–6506, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Time-aware language models as temporal knowledge bases. Transactions of the Association for Computational Linguistics, 10:257–273.
- Model editing can hurt general abilities of large language models.
- Xiezhi: An ever-updating benchmark for holistic domain knowledge evaluation. arXiv preprint arXiv:2306.05783.
- Do language models have beliefs? methods for detecting, updating, and visualizing model beliefs. arXiv preprint arXiv:2111.13654.
- Detecting edit failures in large language models: An improved specificity benchmark.
- Detecting edit failures in large language models: An improved specificity benchmark. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11548–11559, Toronto, Canada. Association for Computational Linguistics.
- Towards continual knowledge learning of language models. In International Conference on Learning Representations.
- Mind the gap: Assessing temporal generalization in neural language models. Advances in Neural Information Processing Systems, 34:29348–29363.
- Zero-shot relation extraction via reading comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 333–342, Vancouver, Canada. Association for Computational Linguistics.
- Large language models with controllable working memory. arXiv preprint arXiv:2211.05110.
- Pmet: Precise model editing in a transformer. arXiv preprint arXiv:2308.08742.
- Unveiling the pitfalls of knowledge editing for large language models. arXiv preprint arXiv:2310.02129.
- Evaluating dependencies in fact editing for language models: Specificity and implication awareness. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7623–7636.
- Memory-assisted prompt editing to improve gpt-3 after deployment. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2833–2861.
- Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372.
- Mass-editing memory in a transformer. In The Eleventh International Conference on Learning Representations.
- Fast model editing at scale. In International Conference on Learning Representations.
- Memory-based model editing at scale. In International Conference on Machine Learning, pages 15817–15831. PMLR.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15.
- Fixing model bugs with natural language patches. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11600–11613.
- OpenAI. 2023. Gpt-4 technical report.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Memory injections: Correcting multi-hop reasoning failures during inference in transformer-based language models. arXiv preprint arXiv:2309.05605.
- Prompting GPT-3 to be reliable. In The Eleventh International Conference on Learning Representations.
- Editable neural networks. In International Conference on Learning Representations.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Knowledge editing for large language models: A survey.
- Kepler: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics, 9:176–194.
- Editing large language models: Problems, methods, and opportunities. arXiv preprint arXiv:2305.13172.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- Modifying memories in transformer models. arXiv preprint arXiv:2012.00363.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.