Delving into the Reversal Curse: How Far Can Large Language Models Generalize? (2410.18808v2)
Abstract: While LLMs showcase unprecedented capabilities, they also exhibit certain inherent limitations when facing seemingly trivial tasks. A prime example is the recently debated "reversal curse", which surfaces when models, having been trained on the fact "A is B", struggle to generalize this knowledge to infer that "B is A". In this paper, we examine the manifestation of the reversal curse across various tasks and delve into both the generalization abilities and the problem-solving mechanisms of LLMs. This investigation leads to a series of significant insights: (1) LLMs are able to generalize to "B is A" when both A and B are presented in the context as in the case of a multiple-choice question. (2) This generalization ability is highly correlated to the structure of the fact "A is B" in the training documents. For example, this generalization only applies to biographies structured in "[Name] is [Description]" but not to "[Description] is [Name]". (3) We propose and verify the hypothesis that LLMs possess an inherent bias in fact recalling during knowledge application, which explains and underscores the importance of the document structure to successful learning. (4) The negative impact of this bias on the downstream performance of LLMs can hardly be mitigated through training alone. These findings offer a novel perspective on interpreting LLMs' generalization through their intrinsic mechanisms and provide insights for developing more effective learning methods. Our code and data are available at https://github.com/alibaba/thinking_bias.git.
- AI@Meta. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
- The reversal curse: Llms trained on "a is b" fail to learn "b is a". CoRR, abs/2309.12288, 2023. doi: 10.48550/ARXIV.2309.12288. URL https://doi.org/10.48550/arXiv.2309.12288.
- Think you have solved direct-answer question answering? try arc-da, the direct-answer AI2 reasoning challenge. CoRR, abs/2102.03315, 2021. URL https://arxiv.org/abs/2102.03315.
- Sam Bowman. Eight things to know about large language models. ArXiv, abs/2304.00612, 2023. URL https://api.semanticscholar.org/CorpusID:257913333.
- Sparks of artificial general intelligence: Early experiments with GPT-4. CoRR, abs/2303.12712, 2023. doi: 10.48550/ARXIV.2303.12712. URL https://doi.org/10.48550/arXiv.2303.12712.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023), 2023.
- Continual pre-training mitigates forgetting in language and vision. CoRR, abs/2205.09357, 2022. doi: 10.48550/ARXIV.2205.09357. URL https://doi.org/10.48550/arXiv.2205.09357.
- Auggpt: Leveraging chatgpt for text data augmentation. arXiv preprint arXiv:2302.13007, 2023.
- Faith and fate: Limits of transformers on compositionality. ArXiv, abs/2305.18654, 2023. URL https://api.semanticscholar.org/CorpusID:258967391.
- Tinystories: How small can language models be and still speak coherent english? CoRR, abs/2305.07759, 2023. doi: 10.48550/ARXIV.2305.07759. URL https://doi.org/10.48550/arXiv.2305.07759.
- Transformer feed-forward layers are key-value memories. In Empirical Methods in Natural Language Processing (EMNLP), 2021.
- Studying large language model generalization with influence functions. CoRR, abs/2308.03296, 2023. doi: 10.48550/ARXIV.2308.03296. URL https://doi.org/10.48550/arXiv.2308.03296.
- Textbooks are all you need. CoRR, abs/2306.11644, 2023. doi: 10.48550/ARXIV.2306.11644. URL https://doi.org/10.48550/arXiv.2306.11644.
- Sequence-level mixed sample data augmentation. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 5547–5552. Association for Computational Linguistics, 2020. doi: 10.18653/V1/2020.EMNLP-MAIN.447. URL https://doi.org/10.18653/v1/2020.emnlp-main.447.
- Measuring massive multitask language understanding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Lora: Low-rank adaptation of large language models. ArXiv, abs/2106.09685, 2021. URL https://api.semanticscholar.org/CorpusID:235458009.
- Large language models can self-improve. arXiv preprint arXiv:2210.11610, 2022.
- Mistral 7b. CoRR, abs/2310.06825, 2023. doi: 10.48550/ARXIV.2310.06825. URL https://doi.org/10.48550/arXiv.2310.06825.
- Continual pre-training of language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=m_GDIItaI3o.
- Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014. URL https://api.semanticscholar.org/CorpusID:6628106.
- Deduplicating training data makes language models better. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 8424–8445. Association for Computational Linguistics, 2022. doi: 10.18653/V1/2022.ACL-LONG.577. URL https://doi.org/10.18653/v1/2022.acl-long.577.
- Crafting in-context examples according to lms’ parametric knowledge. CoRR, abs/2311.09579, 2023. doi: 10.48550/ARXIV.2311.09579. URL https://doi.org/10.48550/arXiv.2311.09579.
- Subject and topic: A new typology of language. 1976.
- Emergent world representations: Exploring a sequence model trained on a synthetic task. ArXiv, abs/2210.13382, 2022. URL https://api.semanticscholar.org/CorpusID:253098566.
- Forward and backward recall: Different retrieval processes. Journal of Experimental Psychology Learning Memory and Cognition, 21(4):837–847, 1995.
- Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
- Truthfulqa: Measuring how models mimic human falsehoods. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 3214–3252. Association for Computational Linguistics, 2022. doi: 10.18653/V1/2022.ACL-LONG.229. URL https://doi.org/10.18653/v1/2022.acl-long.229.
- Towards understanding grokking: An effective theory of representation learning. ArXiv, abs/2205.10343, 2022. URL https://api.semanticscholar.org/CorpusID:248965387.
- Are we falling in a middle-intelligence trap? an analysis and mitigation of the reversal curse. CoRR, abs/2311.07468, 2023. doi: 10.48550/ARXIV.2311.07468. URL https://doi.org/10.48550/arXiv.2311.07468.
- Locating and editing factual associations in GPT. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
- Mass-editing memory in a transformer. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
- Do we need to create big datasets to learn a task? In Nafise Sadat Moosavi, Angela Fan, Vered Shwartz, Goran Glavas, Shafiq R. Joty, Alex Wang, and Thomas Wolf, editors, Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, SustaiNLP@EMNLP 2020, Online, November 20, 2020, pages 169–173. Association for Computational Linguistics, 2020. doi: 10.18653/V1/2020.SUSTAINLP-1.23. URL https://doi.org/10.18653/v1/2020.sustainlp-1.23.
- The debate over understanding in ai’s large language models. Proceedings of the National Academy of Sciences of the United States of America, 120, 2022. URL https://api.semanticscholar.org/CorpusID:253107905.
- Grokking of hierarchical structure in vanilla transformers. In Annual Meeting of the Association for Computational Linguistics, 2023. URL https://api.semanticscholar.org/CorpusID:258967837.
- Show your work: Scratchpads for intermediate computation with language models. ArXiv, abs/2112.00114, 2021. URL https://api.semanticscholar.org/CorpusID:244773644.
- OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023. doi: 10.48550/ARXIV.2303.08774. URL https://doi.org/10.48550/arXiv.2303.08774.
- Unifying large language models and knowledge graphs: A roadmap. CoRR, abs/2306.08302, 2023. doi: 10.48550/ARXIV.2306.08302. URL https://doi.org/10.48550/arXiv.2306.08302.
- Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813, 2023.
- Let’s think dot by dot: Hidden computation in transformer language models. arXiv preprint arXiv:2404.15758, 2024.
- Deep inside convolutional networks: Visualising image classification models and saliency maps. In Yoshua Bengio and Yann LeCun, editors, 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Workshop Track Proceedings, 2014.
- Beyond neural scaling laws: beating power law scaling via data pruning. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
- Recitation-augmented language models. ArXiv, abs/2210.01296, 2022. URL https://api.semanticscholar.org/CorpusID:252692968.
- Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023. doi: 10.48550/ARXIV.2307.09288. URL https://doi.org/10.48550/arXiv.2307.09288.
- Review of entity relation extraction. J. Intell. Fuzzy Syst., 44(5):7391–7405, 2023. doi: 10.3233/JIFS-223915. URL https://doi.org/10.3233/JIFS-223915.
- Label words are anchors: An information flow perspective for understanding in-context learning. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 9840–9855. Association for Computational Linguistics, 2023a.
- Self-instruct: Aligning language models with self-generated instructions. In Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 13484–13508. Association for Computational Linguistics, 2023b. doi: 10.18653/V1/2023.ACL-LONG.754. URL https://doi.org/10.18653/v1/2023.acl-long.754.
- Chain-of-thought prompting elicits reasoning in large language models. In Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022.
- EDA: easy data augmentation techniques for boosting performance on text classification tasks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 6381–6387. Association for Computational Linguistics, 2019. doi: 10.18653/V1/D19-1670. URL https://doi.org/10.18653/v1/D19-1670.
- Llm-powered data augmentation for enhanced cross-lingual performance. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 671–686. Association for Computational Linguistics, 2023. URL https://aclanthology.org/2023.emnlp-main.44.
- Efficient continual pre-training for building domain specific large language models. ArXiv, abs/2311.08545, 2023. URL https://api.semanticscholar.org/CorpusID:265213147.
- OPT: open pre-trained transformer language models. CoRR, abs/2205.01068, 2022. doi: 10.48550/ARXIV.2205.01068. URL https://doi.org/10.48550/arXiv.2205.01068.
- LIMA: less is more for alignment. CoRR, abs/2305.11206, 2023. doi: 10.48550/ARXIV.2305.11206. URL https://doi.org/10.48550/arXiv.2305.11206.
- Towards a theoretical understanding of the ’reversal curse’ via training dynamics. 2024. URL https://api.semanticscholar.org/CorpusID:269626444.
- Physics of language models: Part 3.1, knowledge storage and extraction. CoRR, abs/2309.14316, 2023. doi: 10.48550/ARXIV.2309.14316. URL https://doi.org/10.48550/arXiv.2309.14316.
- Zhengkai Lin (2 papers)
- Zhihang Fu (17 papers)
- Kai Liu (391 papers)
- Liang Xie (38 papers)
- Binbin Lin (50 papers)
- Wenxiao Wang (63 papers)
- Deng Cai (181 papers)
- Yue Wu (339 papers)
- Jieping Ye (169 papers)