Large Language Models Cannot Self-Correct Reasoning Yet (2310.01798v2)
Abstract: LLMs have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues. Building upon this premise, this paper critically examines the role and efficacy of self-correction within LLMs, shedding light on its true potential and limitations. Central to our investigation is the notion of intrinsic self-correction, whereby an LLM attempts to correct its initial responses based solely on its inherent capabilities, without the crutch of external feedback. In the context of reasoning, our research indicates that LLMs struggle to self-correct their responses without external feedback, and at times, their performance even degrades after self-correction. Drawing from these insights, we offer suggestions for future research and practical applications in this field.
- Artificial hallucinations in chatgpt: implications in scientific writing. Cureus, 15(2), 2023.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073, 2022.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
- Extracting training data from large language models. In USENIX Security Symposium, volume 6, 2021.
- Reconcile: Round-table conference improves reasoning via consensus among diverse llms. arXiv preprint arXiv:2309.13007, 2023a.
- Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128, 2023b.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
- Improving factuality and reasoning in language models through multiagent debate. arXiv preprint arXiv:2305.14325, 2023.
- The capacity for moral self-correction in large language models. arXiv preprint arXiv:2302.07459, 2023.
- Rarr: Researching and revising what language models say, using language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 16477–16508, 2023.
- Critic: Large language models can self-correct with tool-interactive critiquing. arXiv preprint arXiv:2305.11738, 2023.
- Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. In Findings of the Association for Computational Linguistics: ACL 2023. Association for Computational Linguistics, 2023.
- Are large pre-trained language models leaking your personal information? In Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 2038–2047, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics.
- Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221, 2022.
- Language models can solve computer tasks. The ICML Workshop on Artificial Intelligence & Human Computer Interaction, 2023.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Multi-step jailbreaking privacy attacks on chatgpt. ArXiv preprint, abs/2304.05197, 2023.
- Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118, 2023.
- Let’s verify step by step. arXiv preprint arXiv:2305.20050, 2023.
- Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 2023.
- Demystifying gpt self-repair for code generation. arXiv preprint arXiv:2306.09896, 2023.
- OpenAI. Gpt-4 technical report, 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies. arXiv preprint arXiv:2308.03188, 2023.
- Refiner: Reasoning feedback on intermediate representations. arXiv preprint arXiv:2304.01904, 2023.
- True few-shot learning with language models. Advances in neural information processing systems, 34:11054–11070, 2021.
- Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476, 2023.
- Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986.
- Quantifying association capabilities of large language models and its implications on privacy leakage. arXiv preprint arXiv:2305.12707, 2023.
- Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, pp. 31210–31227. PMLR, 2023.
- Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems, 2023.
- Reinforcement learning: An introduction. MIT press, 2018.
- Commonsenseqa: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4149–4158, 2019.
- Shepherd: A critic for language model generation. arXiv preprint arXiv:2308.04592, 2023.
- Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, 2022.
- Jailbroken: How does llm safety training fail? arXiv preprint arXiv:2307.02483, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Generating sequences by learning to self-correct. In The Eleventh International Conference on Learning Representations, 2023.
- Large language models as optimizers. arXiv preprint arXiv:2309.03409, 2023.
- Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
- Why does chatgpt fall short in providing truthful answers. ArXiv preprint, abs/2304.10513, 2023.
- Solving challenging math word problems using gpt-4 code interpreter with code-based self-verification. arXiv preprint arXiv:2308.07921, 2023a.
- Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations, 2023b.
- Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations, 2022.
- Universal and transferable adversarial attacks on aligned language models. arXiv preprint arXiv:2307.15043, 2023.