Should We Really Edit Language Models? On the Evaluation of Edited Language Models (2410.18785v1)
Abstract: Model editing has become an increasingly popular alternative for efficiently updating knowledge within LLMs. Current methods mainly focus on reliability, generalization, and locality, with many methods excelling across these criteria. Some recent works disclose the pitfalls of these editing methods such as knowledge distortion or conflict. However, the general abilities of post-edited LLMs remain unexplored. In this paper, we perform a comprehensive evaluation on various editing methods and different LLMs, and have following findings. (1) Existing editing methods lead to inevitable performance deterioration on general benchmarks, indicating that existing editing methods maintain the general abilities of the model within only a few dozen edits. When the number of edits is slightly large, the intrinsic knowledge structure of the model is disrupted or even completely damaged. (2) Instruction-tuned models are more robust to editing, showing less performance drop on general knowledge after editing. (3) LLM with large scale is more resistant to editing compared to small model. (4) The safety of the edited model, is significantly weakened, even for those safety-aligned models. Our findings indicate that current editing methods are only suitable for small-scale knowledge updates within LLMs, which motivates further research on more practical and reliable editing methods. The details of code and reproduction can be found in https://github.com/lqinfdim/EditingEvaluation.
- OpenAI. Introducing chatgpt, 2022.
- Anthropic. The Claude 3 Model Family: Opus, Sonnet, Haiku. https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf, March 2024.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019.
- Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 2024.
- Large language models encode clinical knowledge. Nature, 620(7972):172–180, 2023.
- Dissecting the runtime performance of the training, fine-tuning, and inference of large language models. arXiv preprint arXiv:2311.03687, 2023.
- Fusionai: Decentralized training and deploying llms with massive consumer-level gpus. arXiv preprint arXiv:2309.01172, 2023.
- Fusionllm: A decentralized llm training system on geo-distributed gpus with adaptive compression, 2024.
- Knowledge editing for large language models: A survey. arXiv preprint arXiv:2310.16218, 2023.
- A survey on knowledge editing of neural networks. arXiv preprint arXiv:2310.19704, 2023.
- A comprehensive study of knowledge editing for large language models, 2024.
- Can we edit factual knowledge by in-context learning? arXiv preprint arXiv:2305.12740, 2023.
- Fast model editing at scale. arXiv preprint arXiv:2110.11309, 2021.
- Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372, 2022.
- Memory-based model editing at scale. In International Conference on Machine Learning, pages 15817–15831. PMLR, 2022.
- Editing large language models: Problems, methods, and opportunities. arXiv preprint arXiv:2305.13172, 2023.
- Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229, 2022.
- Evaluating the ripple effects of knowledge editing in language models. arXiv preprint arXiv:2307.12976, 2023.
- Mquake: Assessing knowledge editing in language models via multi-hop questions. arXiv preprint arXiv:2305.14795, 2023.
- Unveiling the pitfalls of knowledge editing for large language models. arXiv preprint arXiv:2310.02129, 2023.
- Qi Li and Xiaowen Chu. Can we continually edit language models? on the knowledge attenuation in sequential model editing. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Findings of the Association for Computational Linguistics ACL 2024, pages 5438–5455, Bangkok, Thailand and virtual meeting, August 2024. Association for Computational Linguistics.
- Model editing at scale leads to gradual and catastrophic forgetting. arXiv preprint arXiv:2401.07453, 2024.
- Model editing can hurt general abilities of large language models. arXiv preprint arXiv:2401.04700, 2024.
- The missing piece in model editing: A deep dive into the hidden damage brought by model editing. arXiv preprint arXiv:2403.07825, 2024.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
- Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261, 2022.
- Qwen technical report. arXiv preprint arXiv:2309.16609, 2023.
- Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
- Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295, 2024.
- Longgenbench: Long-context generation benchmark. arXiv preprint arXiv:2410.04199, 2024.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023.
- Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8493–8502, 2022.
- Pmet: Precise model editing in a transformer. arXiv preprint arXiv:2308.08742, 2023.
- Aging with grace: Lifelong model editing with discrete key-value adaptors. Advances in Neural Information Processing Systems, 36, 2024.
- Zero-shot relation extraction via reading comprehension. CoNLL 2017, page 333, 2017.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
- Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937, 2018.
- Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, July 2017. Association for Computational Linguistics.
- Stereotyping norwegian salmon: An inventory of pitfalls in fairness benchmark datasets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1004–1015, 2021.
- Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. arXiv preprint arXiv:2203.09509, 2022.
- NVIDIA. TensorRT. https://github.com/NVIDIA/TensorRT-LLM, 2023.
- Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023.
- GitHub. LightLLM: A python-based large language model inference and serving framework. https://github.com/ModelTC/lightllm, 2023.
- Editable neural networks. In International Conference on Learning Representations, 2019.
- Modifying memories in transformer models. arXiv preprint arXiv:2012.00363, 2020.
- Virtual homogeneity learning: Defending against data heterogeneity in federated learning. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 21111–21132. PMLR, 17–23 Jul 2022.
- Editing factual knowledge in language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6491–6506, 2021.
- Large language models with controllable working memory. arXiv preprint arXiv:2211.05110, 2022.
- Transformer feed-forward layers are key-value memories. arXiv preprint arXiv:2012.14913, 2020.
- Rewriting a deep generative model. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 351–369. Springer, 2020.
- Editing common sense in transformers. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8214–8232, 2023.
- Rank-one editing of encoder-decoder models. arXiv preprint arXiv:2211.13317, 2022.
- Untying the reversal curse via bidirectional language model editing. arXiv preprint arXiv:2310.10322, 2023.
- Pokemqa: Programmable knowledge editing for multi-hop question answering. arXiv preprint arXiv:2312.15194, 2023.
- Recall and learn: Fine-tuning deep pretrained language models with less forgetting. arXiv preprint arXiv:2004.12651, 2020.
- Does fine-tuning llms on new knowledge encourage hallucinations? arXiv preprint arXiv:2405.05904, 2024.
- An empirical study of catastrophic forgetting in large language models during continual fine-tuning. arXiv preprint arXiv:2308.08747, 2023.
- Continual learning for large language models: A survey. arXiv preprint arXiv:2402.01364, 2024.
- Falcon-40B: an open large language model with state-of-the-art performance. arXiv preprint, 2023.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
- Holistic evaluation of language models. arXiv preprint arXiv:2211.09110, 2022.
- Bleurt: Learning robust metrics for text generation. arXiv preprint arXiv:2004.04696, 2020.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
- Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018.
- Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457, 2018.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Are emergent abilities of large language models a mirage? Advances in Neural Information Processing Systems, 36, 2024.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
- Bold: Dataset and metrics for measuring biases in open-ended language generation. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 862–872, 2021.
- Beyond probabilities: Unveiling the misalignment in evaluating large language models, 2024.
- Chain-of-thought hub: A continuous effort to measure large language models’ reasoning performance. arXiv preprint arXiv:2305.17306, 2023.
- Active prompting with chain-of-thought for large language models. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1330–1350, Bangkok, Thailand, August 2024. Association for Computational Linguistics.
- A framework for few-shot language model evaluation, 12 2023.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.