Untying the Reversal Curse via Bidirectional Language Model Editing (2310.10322v2)
Abstract: Recent studies have demonstrated that LLMs store massive factual knowledge within their parameters. But existing LLMs are prone to hallucinate unintended text due to false or outdated knowledge. Since retraining LLMs is resource intensive, there has been a growing interest in the concept of model editing. Despite the emergence of benchmarks and approaches, these unidirectional editing and evaluation have failed to explore the reversal curse. Intuitively, if "The capital of France is" is edited to be a counterfact "London" within a model, then it should be able to naturally reason and recall the reverse fact, i.e., "London is the capital of" followed by "France" instead of "England". In this paper, we study bidirectional LLM editing, aiming to provide rigorous model editing evaluation to assess if edited LLMs can recall the editing knowledge bidirectionally. A new evaluation metric of reversibility is introduced, and a benchmark dubbed as Bidirectional Assessment for Knowledge Editing (BAKE) is constructed to evaluate the reversibility of edited models in recalling knowledge in the reverse direction of editing. We surprisingly observe that while current editing methods and LLMs can effectively recall editing facts in the direction of editing, they suffer serious deficiencies when evaluated in the reverse direction. To mitigate the reversal curse, a method named Bidirectionally Inversible Relationship moDeling (BIRD) is proposed. A set of editing objectives that incorporate bidirectional relationships between subject and object into the updated model weights are designed. Experiments show that BIRD improves the performance of four representative LLMs of different sizes via question answering and judgement.
- Rewriting a deep generative model. In Vedaldi, A., Bischof, H., Brox, T., and Frahm, J. (eds.), Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I, volume 12346 of Lecture Notes in Computer Science, pp. 351–369. Springer, 2020. doi: 10.1007/978-3-030-58452-8_21. URL https://doi.org/10.1007/978-3-030-58452-8_21.
- The reversal curse: Llms trained on ”a is b” fail to learn ”b is a”, 2023.
- Translating embeddings for modeling multi-relational data. In Burges, C. J. C., Bottou, L., Ghahramani, Z., and Weinberger, K. Q. (eds.), Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pp. 2787–2795, 2013. URL https://proceedings.neurips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html.
- Editing factual knowledge in language models. In Moens, M., Huang, X., Specia, L., and Yih, S. W. (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pp. 6491–6506. Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.emnlp-main.522. URL https://doi.org/10.18653/v1/2021.emnlp-main.522.
- Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. URL https://arxiv.org/abs/2107.03374.
- What does BERT look at? an analysis of bert’s attention. In Linzen, T., Chrupala, G., Belinkov, Y., and Hupkes, D. (eds.), Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@ACL 2019, Florence, Italy, August 1, 2019, pp. 276–286. Association for Computational Linguistics, 2019. doi: 10.18653/v1/W19-4828. URL https://doi.org/10.18653/v1/W19-4828.
- Analyzing commonsense emergence in few-shot knowledge models. In Chen, D., Berant, J., McCallum, A., and Singh, S. (eds.), 3rd Conference on Automated Knowledge Base Construction, AKBC 2021, Virtual, October 4-8, 2021, 2021. doi: 10.24432/C5NK5J. URL https://doi.org/10.24432/C5NK5J.
- Knowledge neurons in pretrained transformers. In Muresan, S., Nakov, P., and Villavicencio, A. (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp. 8493–8502. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.acl-long.581. URL https://doi.org/10.18653/v1/2022.acl-long.581.
- Transformer feed-forward layers are key-value memories. In Moens, M., Huang, X., Specia, L., and Yih, S. W. (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pp. 5484–5495. Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.emnlp-main.446. URL https://doi.org/10.18653/v1/2021.emnlp-main.446.
- Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. In Goldberg, Y., Kozareva, Z., and Zhang, Y. (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pp. 30–45. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.emnlp-main.3. URL https://doi.org/10.18653/v1/2022.emnlp-main.3.
- Studying large language model generalization with influence functions. CoRR, abs/2308.03296, 2023. doi: 10.48550/arXiv.2308.03296. URL https://doi.org/10.48550/arXiv.2308.03296.
- Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12):248:1–248:38, 2023. doi: 10.1145/3571730. URL https://doi.org/10.1145/3571730.
- Zero-shot relation extraction via reading comprehension. In Levy, R. and Specia, L. (eds.), Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, Canada, August 3-4, 2017, pp. 333–342. Association for Computational Linguistics, 2017. doi: 10.18653/v1/K17-1034. URL https://doi.org/10.18653/v1/K17-1034.
- Flow-adapter architecture for unsupervised machine translation. In Muresan, S., Nakov, P., and Villavicencio, A. (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp. 1253–1266. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.acl-long.89. URL https://doi.org/10.18653/v1/2022.acl-long.89.
- Wider & closer: Mixture of short-channel distillers for zero-shot cross-lingual named entity recognition. In Goldberg, Y., Kozareva, Z., and Zhang, Y. (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pp. 5171–5183. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.emnlp-main.345. URL https://doi.org/10.18653/v1/2022.emnlp-main.345.
- Locating and editing factual associations in GPT. In NeurIPS, 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/6f1d43d5a82a37e89b0665b33bf3a182-Paper-Conference.pdf.
- Mass-editing memory in a transformer. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=MkbcAHIYgyS.
- Efficient estimation of word representations in vector space. In Bengio, Y. and LeCun, Y. (eds.), 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013. URL http://arxiv.org/abs/1301.3781.
- Fast model editing at scale. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022a. URL https://openreview.net/forum?id=0DcZxeWfOPt.
- Memory-based model editing at scale. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvári, C., Niu, G., and Sabato, S. (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp. 15817–15831. PMLR, 2022b. URL https://proceedings.mlr.press/v162/mitchell22a.html.
- Training language models to follow instructions with human feedback. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html.
- Pearl, J. Direct and indirect effects. In Breese, J. S. and Koller, D. (eds.), UAI ’01: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, University of Washington, Seattle, Washington, USA, August 2-5, 2001, pp. 411–420. Morgan Kaufmann, 2001. URL https://dslpitt.org/uai/displayArticleDetails.jsp?mmnu=1&smnu=2&article_id=126&proceeding_id=17.
- Check your facts and try again: Improving large language models with external knowledge and automated feedback. CoRR, abs/2302.12813, 2023. doi: 10.48550/arXiv.2302.12813. URL https://doi.org/10.48550/arXiv.2302.12813.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Green AI. Commun. ACM, 63(12):54–63, 2020. doi: 10.1145/3381831. URL https://doi.org/10.1145/3381831.
- Editable neural networks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=HJedXaEtvS.
- BERT rediscovers the classical NLP pipeline. In Korhonen, A., Traum, D. R., and Màrquez, L. (eds.), Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp. 4593–4601. Association for Computational Linguistics, 2019. doi: 10.18653/v1/p19-1452. URL https://doi.org/10.18653/v1/p19-1452.
- Llama: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023a. doi: 10.48550/arXiv.2302.13971. URL https://doi.org/10.48550/arXiv.2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023b. doi: 10.48550/arXiv.2307.09288. URL https://doi.org/10.48550/arXiv.2307.09288.
- Attention is all you need. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
- Analyzing the structure of attention in a transformer language model. In Linzen, T., Chrupala, G., Belinkov, Y., and Hupkes, D. (eds.), Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@ACL 2019, Florence, Italy, August 1, 2019, pp. 63–76. Association for Computational Linguistics, 2019. doi: 10.18653/v1/W19-4808. URL https://doi.org/10.18653/v1/W19-4808.
- Investigating gender bias in language models using causal mediation analysis. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/92650b2e92217715fe312e6fa7b90d82-Paper.pdf.
- The bottom-up evolution of representations in the transformer: A study with machine translation and language modeling objectives. In Inui, K., Jiang, J., Ng, V., and Wan, X. (eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pp. 4395–4405. Association for Computational Linguistics, 2019. doi: 10.18653/v1/D19-1448. URL https://doi.org/10.18653/v1/D19-1448.
- Wikidata: a free collaborative knowledgebase. Commun. ACM, 57(10):78–85, 2014. doi: 10.1145/2629489. URL https://doi.org/10.1145/2629489.
- Gpt-j-6b: A 6 billion parameter autoregressive language model, 2021.
- Easyedit: An easy-to-use knowledge editing framework for large language models. CoRR, abs/2308.07269, 2023. doi: 10.48550/arXiv.2308.07269. URL https://doi.org/10.48550/arXiv.2308.07269.
- Interpretability at scale: Identifying causal mechanisms in alpaca. CoRR, abs/2305.08809, 2023. doi: 10.48550/arXiv.2305.08809. URL https://doi.org/10.48550/arXiv.2305.08809.
- A survey on green deep learning. CoRR, abs/2111.05193, 2021. URL https://arxiv.org/abs/2111.05193.
- Editing large language models: Problems, methods, and opportunities. CoRR, abs/2305.13172, 2023. doi: 10.48550/arXiv.2305.13172. URL https://doi.org/10.48550/arXiv.2305.13172.
- Prompting large language model for machine translation: A case study. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 41092–41110. PMLR, 2023a. URL https://proceedings.mlr.press/v202/zhang23m.html.
- Siren’s song in the AI ocean: A survey on hallucination in large language models. CoRR, abs/2309.01219, 2023b. doi: 10.48550/arXiv.2309.01219. URL https://doi.org/10.48550/arXiv.2309.01219.
- Mquake: Assessing knowledge editing in language models via multi-hop questions. CoRR, abs/2305.14795, 2023. doi: 10.48550/arXiv.2305.14795. URL https://doi.org/10.48550/arXiv.2305.14795.
- Modifying memories in transformer models. CoRR, abs/2012.00363, 2020. URL https://arxiv.org/abs/2012.00363.
- Jun-Yu Ma (9 papers)
- Jia-Chen Gu (42 papers)
- Zhen-Hua Ling (114 papers)
- Quan Liu (116 papers)
- Cong Liu (169 papers)