Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" (2309.12288v4)

Published 21 Sep 2023 in cs.CL, cs.AI, and cs.LG

Abstract: We expose a surprising failure of generalization in auto-regressive LLMs. If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Valentina Tereshkova was the first woman to travel to space", it will not automatically be able to answer the question, "Who was the first woman to travel to space?". Moreover, the likelihood of the correct answer ("Valentina Tershkova") will not be higher than for a random name. Thus, models do not generalize a prevalent pattern in their training set: if "A is B" occurs, "B is A" is more likely to occur. It is worth noting, however, that if "A is B" appears in-context, models can deduce the reverse relationship. We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of Abyssal Melodies" and showing that they fail to correctly answer "Who composed Abyssal Melodies?". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. Code available at: https://github.com/lukasberglund/reversal_curse.

Overview of "The Reversal Curse: LLMs trained on A is B fail to learn B is A"

The paper "The Reversal Curse: LLMs trained on A is B fail to learn B is A" by Lukas Berglund et al. investigates a fundamental shortcoming in how auto-regressive LLMs generalize patterns from their training data. The authors identify the phenomenon termed the "Reversal Curse," wherein models trained on statements such as "A is B" do not generalize to the reversed form "B is A." This deficiency is examined through a series of experiments and is consistently observed across varied model sizes and types, including GPT-3 and Llama-1.

Key Findings

  1. Reversal Curse Identification: The paper establishes that LLMs fail to logically deduce the reverse of learned facts. For instance, if a model is trained with "Olaf Scholz was the ninth Chancellor of Germany," it is unable to infer that "The ninth Chancellor of Germany was Olaf Scholz" with any greater likelihood than a random guess. This indicates a shortfall in the logical symmetry expected from such learning systems.
  2. Experimental Validation: Utilizing both fictitious and real-world data, the researchers confirm the robustness of the Reversal Curse:

    1. Fictitious Data: In one set of experiments, models were fine-tuned on synthetic facts (e.g., "Uriah Hawthorne is the composer of Abyssal Melodies") and tested for their ability to reverse these facts. The results showed that while models responded correctly when asked in the fine-tuned order, they performed no better than random guessing for the reversed queries.
    2. Real-World Data: Further testing with real-world questions about celebrities, such as "Who is Tom Cruise's mother?" and "Who is Mary Lee Pfeiffer's son?", yielded similar outcomes. While GPT-4 answered the former correctly 79% of the time, it could only answer the reversed query correctly 33% of the time.
  3. Ineffectiveness of Data Augmentation: The paper also explored whether various training setups could mitigate the Reversal Curse. This involved different hyperparameters, inclusion of auxiliary examples, paraphrases, and altered data formats (e.g., converting statements into question-answer pairs). None of these interventions successfully alleviated the curse.

Implications

The findings have several far-reaching implications:

  • Logical Deduction in LLMs: The Reversal Curse underscores a significant gap in the logical reasoning capabilities of current LLMs. This has profound implications for their reliability and efficacy in applications requiring logical consistency.
  • Meta-Learning: The curse also points to limitations in the meta-learning abilities of LLMs. Despite the prevalence of reversed fact patterns in training data, models fail to adjust their probabilities appropriately, suggesting a fundamental flaw in how these models internalize and generalize information.
  • Model Design and Training Paradigms: The persistent nature of the curse across various models and configurations indicates a need for rethinking model architectures or training paradigms. Methods that enable models to appreciate and utilize the symmetry of logical relations are crucial for advancing the state of these systems.

Future Directions

  1. Further Investigation into Reversal of Relations: The paper suggests exploring whether the Reversal Curse extends to other types of logical relations beyond identity, such as implications or spatial relationships.
  2. Analysis of Training Data: Utilizing entity-linking techniques in pretraining datasets might help identify instances where information only appears in one direction, providing insights into mitigating the problem.
  3. Alternative Learning Models: Non-auto-regressive models or alternative paradigms for knowledge representation and learning might avoid the Reversal Curse, warranting further research in these areas.
  4. Practical Impact Assessment: Investigating the practical effects of the Reversal Curse on real-world deployments of LLMs can guide optimizations in training regimes, particularly for tasks involving sparse data representations.

Conclusion

The Reversal Curse highlights a critical area of improvement for LLMs in terms of logical reasoning and generalized learning. Addressing this issue will be pivotal in enhancing the robustness and reliability of AI systems, particularly as their applications continue to expand into more complex and critical domains. Future research must explore the underpinnings of this phenomenon and explore innovative approaches to overcome it.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Taken out of context: On measuring situational awareness in llms, 2023.
  2. Backward recall and benchmark effects of working memory. Memory & Cognition, 38:279–291, 2010. URL https://api.semanticscholar.org/CorpusID:12393461.
  3. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in neural information processing systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  4. Data manipulation: Towards effective instance learning for neural dialogue generation via learning to augment and reweight. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  6334–6343, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.564. URL https://aclanthology.org/2020.acl-main.564.
  5. Are forward and backward recall the same? a dual-task study of digit recall. Memory & Cognition, 41:519–532, 2013. URL https://api.semanticscholar.org/CorpusID:207716696.
  6. Editing factual knowledge in language models. arXiv preprint arXiv:2104.08164, 2021.
  7. Tinystories: How small can language models be and still speak coherent english? arXiv preprint arXiv:2305.07759, 2023.
  8. Evaluating superhuman models with consistency checks, 2023.
  9. Transformer feed-forward layers are key-value memories, 2021.
  10. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space, 2022.
  11. Dissecting recall of factual associations in auto-regressive language models, 2023.
  12. Studying large language model generalization with influence functions, 2023.
  13. Forward and backward recall: Different visuospatial processes when you know what’s coming. Memory & Cognition, 48:111–126, 2019. URL https://api.semanticscholar.org/CorpusID:198913166.
  14. Methods for measuring, updating, and visualizing factual beliefs in language models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp.  2714–2731, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. URL https://aclanthology.org/2023.eacl-main.199.
  15. Understanding by understanding not: Modeling negation in language models, 2021.
  16. IMDb. Search imdb: Match all (sorted by popularity ascending). https://www.imdb.com/search/name/?match_all=true&start=1&ref_=rlm, 2023. Accessed: 28 June 2023.
  17. Large language models struggle to learn long-tail knowledge, 2023.
  18. Sosuke Kobayashi. Contextual augmentation: Data augmentation by words with paradigmatic relations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp.  452–457, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-2072. URL https://aclanthology.org/N18-2072.
  19. Shu Chen Li and Stephan Lewandowsky. Forward and backward recall: Different retrieval processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(4):837–847, July 1995. ISSN 0278-7393.
  20. Truthfulqa: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  3214–3252, 2022.
  21. Locating and editing factual associations in gpt, 2023.
  22. Fast model editing at scale. arXiv preprint arXiv:2110.11309, 2021.
  23. OpenAI. Gpt-4 technical report, 2023a.
  24. OpenAI. Openai api. https://openai.com/api/, 2023b. Accessed: 17 August 2023.
  25. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019.
  26. Improving neural machine translation models with monolingual data, 2016.
  27. Large language models can be easily distracted by irrelevant context, 2023.
  28. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
  29. Forward and backward recall. Psychological Science, 14:169 – 174, 2003. URL https://api.semanticscholar.org/CorpusID:30872510.
  30. Llama: Open and efficient foundation language models, 2023.
  31. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.
  32. Bloom: A 176b-parameter open-access multilingual language model, 2023.
  33. Kformer: Knowledge injection in transformer feed-forward layers. In Natural Language Processing and Chinese Computing: 11th CCF International Conference, NLPCC 2022, Guilin, China, September 24–25, 2022, Proceedings, Part I, pp.  131–143. Springer, 2022.
  34. Modifying memories in transformer models. arXiv preprint arXiv:2012.00363, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Lukas Berglund (4 papers)
  2. Meg Tong (8 papers)
  3. Max Kaufmann (5 papers)
  4. Mikita Balesni (11 papers)
  5. Asa Cooper Stickland (15 papers)
  6. Tomasz Korbak (24 papers)
  7. Owain Evans (28 papers)
Citations (185)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com