Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

"Flex Tape Can't Fix That": Bias and Misinformation in Edited Language Models (2403.00180v3)

Published 29 Feb 2024 in cs.CL

Abstract: Model editing has emerged as a cost-effective strategy to update knowledge stored in LLMs. However, model editing can have unintended consequences after edits are applied: information unrelated to the edits can also be changed, and other general behaviors of the model can be wrongly altered. In this work, we investigate how model editing methods unexpectedly amplify model biases post-edit. We introduce a novel benchmark dataset, Seesaw-CF, for measuring bias-related harms of model editing and conduct the first in-depth investigation of how different weight-editing methods impact model bias. Specifically, we focus on biases with respect to demographic attributes such as race, geographic origin, and gender, as well as qualitative flaws in long-form texts generated by edited LLMs. We find that edited models exhibit, to various degrees, more biased behavior as they become less confident in attributes for Asian, African, and South American subjects. Furthermore, edited models amplify sexism and xenophobia in text generations while remaining seemingly coherent and logical. Finally, editing facts about place of birth, country of citizenship, or gender have particularly negative effects on the model's knowledge about unrelated features like field of work.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. A review on language models as knowledge bases. ArXiv, abs/2204.06031.
  2. On measuring and mitigating biased inferences of word embeddings. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):7659–7666.
  3. Measuring and improving consistency in pretrained language models. Transactions of the Association for Computational Linguistics, 9:1012–1031.
  4. Do language models have beliefs? methods for detecting, updating, and visualizing model beliefs. ArXiv, abs/2111.13654.
  5. Inspecting and editing knowledge representations in language models.
  6. Transformer-patcher: One mistake worth one neuron. In The Eleventh International Conference on Learning Representations.
  7. Gender bias in masked language models for multiple languages. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2740–2750, Seattle, United States. Association for Computational Linguistics.
  8. Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems, 36.
  9. Mass editing memory in a transformer. arXiv preprint arXiv:2210.07229.
  10. Fast model editing at scale.
  11. Memory-based model editing at scale. In International Conference on Machine Learning.
  12. Nationality bias in text generation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 116–122, Dubrovnik, Croatia. Association for Computational Linguistics.
  13. Prompting gpt-3 to be reliable. ArXiv, abs/2210.09150.
  14. Mitigating gender bias in natural language processing: Literature review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1630–1640, Florence, Italy. Association for Computational Linguistics.
  15. Ben Wang. 2021. Mesh-Transformer-JAX: Model-Parallel Implementation of Transformer Language Model with JAX. https://github.com/kingoflolz/mesh-transformer-jax.
  16. Editing large language models: Problems, methods, and opportunities. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10222–10240, Singapore. Association for Computational Linguistics.
  17. Mquake: Assessing knowledge editing in language models via multi-hop questions.
  18. Modifying memories in transformer models.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Karina Halevy (4 papers)
  2. Anna Sotnikova (6 papers)
  3. Badr AlKhamissi (24 papers)
  4. Syrielle Montariol (22 papers)
  5. Antoine Bosselut (85 papers)
Citations (2)