Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future (2309.15402v3)

Published 27 Sep 2023 in cs.CL and cs.AI
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

Abstract: Reasoning, a fundamental cognitive process integral to human intelligence, has garnered substantial interest within artificial intelligence. Notably, recent studies have revealed that chain-of-thought prompting significantly enhances LLM's reasoning capabilities, which attracts widespread attention from both academics and industry. In this paper, we systematically investigate relevant research, summarizing advanced methods through a meticulous taxonomy that offers novel perspectives. Moreover, we delve into the current frontiers and delineate the challenges and future directions, thereby shedding light on future research. Furthermore, we engage in a discussion about open questions. We hope this paper serves as an introduction for beginners and fosters future research. Resources have been made publicly available at https://github.com/zchuz/CoT-Reasoning-Survey

A Comprehensive Survey of Chain-of-Thought Reasoning

The paper "A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future" serves as a detailed examination of the development and application of chain-of-thought (CoT) reasoning in the context of artificial intelligence and natural language processing. Chain-of-thought reasoning methods leverage the cognitive processes foundational to human intelligence for enhancing the reasoning capabilities of pre-trained LLMs (PLMs) and LLMs. This survey is notable for its extensive coverage of various methodologies, applications, and potential future directions in this rapidly developing field.

The core of this survey categorizes chain-of-thought reasoning methods into three main approaches: manual, automatic, and semi-automatic. Manual CoT methods involve explicitly annotating demonstrations with thought processes, thus allowing LLMs to mimic complex reasoning steps. Automatic CoT eliminates human intervention by utilizing zero-shot prompt engineering or sampling to generate reasoning paths, albeit with the risk of diminished quality due to the lack of human alignment. The intermediate semi-automatic approach blends elements of both manual and automatic CoT to balance cost, quality, and task generalization.

A significant portion of the survey is dedicated to structural variants of chain-of-thought reasoning. These include adaptations leading to distinct configurations, such as chain, tree, and graph structures. Tree-of-Thought (ToT) and Graph-of-Thought (GoT) illustrate how reasoning can parallel human methods of exploration and backtracking to resemble decisions made in complex scenarios. While these structures enhance the flexibility and capability of reasoning models, they present increased complexity, which restricts their general applicability across diverse tasks.

Additionally, the survey discusses techniques to enhance CoT reasoning such as verification and refinement, which reduces errors through feedback mechanisms and iterative processes. Question decomposition provides a structured approach to tackling complex problems by breaking them down into manageable segments. The incorporation of external knowledge resources mitigates limitations inherent to LLMs in terms of up-to-date information access and factual accuracy. Elements such as voting, ranking, and efficiency techniques further refine the robustness of existing CoT frameworks by enhancing decision-making consistency and reducing computational overhead.

Importantly, the paper situates CoT reasoning within broader applications, highlighting its potential in areas such as tool use, by enriching models with external resources and interfaces to boost reasoning power and access additional data. Planning exemplifies methodical decomposition of tasks into achievable steps, improving problem-solving for intricate tasks. CoT distillation is explored as a means to transfer reasoning capacities from LLMs to smaller, more scalable models, thereby increasing access to advanced capabilities.

To chart future progress, the paper emphasizes enhancing CoT reasoning in multi-modal contexts, increasing faithfulness by reducing hallucinatory outputs, and theoretically grounding CoT methods. The integration of intermediate structures that can handle multi-modal data is poised to significantly expand the contextual reasoning capabilities of AI models. Additionally, addressing faithfulness remains crucial, aiming to improve the factual correctness of AI-generated CoT outputs. The understanding of CoT from both an empirical and theoretical perspective is an area called for further exploration, offering promise for refining reasoning frameworks.

In summary, this survey not only serves as a comprehensive guide to current advancements in chain-of-thought reasoning but also encourages future research to overcome existing limitations and expand the application scope of CoT-enabled PLMs and LLMs. As AI systems increasingly adopt reasoning models, the insights offered in this paper provide a solid foundation for advancing intelligent systems capable of human-like reasoning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (208)
  1. Let’s sample step by step: Adaptive-consistency for efficient reasoning with llms. CoRR, abs/2305.11860.
  2. Flamingo: a visual language model for few-shot learning. In NeurIPS.
  3. Mathqa: Towards interpretable math word problem solving with operation-based formalisms. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 2357–2367. Association for Computational Linguistics.
  4. Ask me anything: A simple strategy for prompting language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  5. Graph of thoughts: Solving elaborate problems with large language models. CoRR, abs/2308.09687.
  6. Think you have solved direct-answer question answering? try arc-da, the direct-answer AI2 reasoning challenge. CoRR, abs/2102.03315.
  7. When do program-of-thoughts work for reasoning?
  8. PIQA: reasoning about physical commonsense in natural language. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 7432–7439. AAAI Press.
  9. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  10. Large language models as tool makers.
  11. Wenhu Chen. 2023. Large language models are few(1)-shot table reasoners. In Findings of the Association for Computational Linguistics: EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, pages 1090–1100. Association for Computational Linguistics.
  12. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. CoRR, abs/2211.12588.
  13. Theoremqa: A theorem-driven question answering dataset. CoRR, abs/2305.12524.
  14. Finqa: A dataset of numerical reasoning over financial data. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 3697–3711. Association for Computational Linguistics.
  15. Convfinqa: Exploring the chain of numerical reasoning in conversational finance question answering. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 6279–6292. Association for Computational Linguistics.
  16. Binding language models in symbolic languages. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  17. Training verifiers to solve math word problems. CoRR, abs/2110.14168.
  18. Agent instructs large language models to be general zero-shot reasoners. arXiv preprint arXiv:2310.03710.
  19. Agent instructs large language models to be general zero-shot reasoners.
  20. Dynamic planning with a llm. ArXiv, abs/2308.06391.
  21. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
  22. Chain-of-verification reduces hallucination in large language models. arXiv preprint arXiv:2309.11495.
  23. Active prompting with chain-of-thought for large language models. CoRR, abs/2302.12246.
  24. A survey for in-context learning. CoRR, abs/2301.00234.
  25. Premise-based multimodal reasoning: Conditional inference on joint textual and visual clues. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 932–946. Association for Computational Linguistics.
  26. Successive prompting for decomposing complex questions. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 1251–1265. Association for Computational Linguistics.
  27. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2368–2378, Minneapolis, Minnesota. Association for Computational Linguistics.
  28. Reasoning implicit sentiment with chain-of-thought prompting. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 1171–1182. Association for Computational Linguistics.
  29. Towards revealing the mystery behind chain of thought: a theoretical perspective. CoRR, abs/2305.15408.
  30. Complexity-based prompting for multi-step reasoning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  31. Specializing smaller language models towards multi-step reasoning. In International Conference on Machine Learning.
  32. PAL: Program-aided language models. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 10764–10799. PMLR.
  33. Alfonso Emilio Gerevini. 2020. An introduction to the planning domain definition language (PDDL): book review. Artif. Intell., 280:103221.
  34. Did aristotle use a laptop? A question answering benchmark with implicit reasoning strategies. Trans. Assoc. Comput. Linguistics, 9:346–361.
  35. Tora: A tool-integrated reasoning agent for mathematical problem solving.
  36. Pranay Gupta and Manish Gupta. 2022. Newskvqa: Knowledge-aware news video question answering. In Advances in Knowledge Discovery and Data Mining - 26th Pacific-Asia Conference, PAKDD 2022, Chengdu, China, May 16-19, 2022, Proceedings, Part III, volume 13282 of Lecture Notes in Computer Science, pages 3–15. Springer.
  37. Dialcot meets ppo: Decomposing and exploring reasoning paths in smaller language models.
  38. FOLIO: natural language reasoning with first-order logic. CoRR, abs/2209.00840.
  39. Reasoning with language model is planning with world model. ArXiv, abs/2305.14992.
  40. Rethinking with retrieval: Faithful large language model inference. CoRR, abs/2301.00303.
  41. Exploring human-like translation strategy with large language models. CoRR, abs/2305.04118.
  42. Measuring mathematical problem solving with the MATH dataset. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual.
  43. Large language models are reasoning teachers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 14852–14882. Association for Computational Linguistics.
  44. Learning to solve arithmetic word problems with verb categorization. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 523–533. ACL.
  45. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes. ArXiv, abs/2305.02301.
  46. Chain-of-symbol prompting elicits planning in large langauge models. CoRR, abs/2305.10276.
  47. Tree-of-mixed-thought: Combining fast and slow thinking for multi-hop visual reasoning. CoRR, abs/2308.09658.
  48. Large language models can self-improve. CoRR, abs/2210.11610.
  49. Large language models cannot self-correct reasoning yet. arXiv preprint arXiv:2310.01798.
  50. Cosmos QA: machine reading comprehension with contextual commonsense reasoning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 2391–2401. Association for Computational Linguistics.
  51. Language is not all you need: Aligning perception with language models. CoRR, abs/2302.14045.
  52. Metatool benchmark: Deciding whether to use tools and which to use.
  53. Mathprompter: Mathematical reasoning using large language models. In Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics: Industry Track, ACL 2023, Toronto, Canada, July 9-14, 2023, pages 37–42. Association for Computational Linguistics.
  54. Towards mitigating hallucination in large language models via self-reflection. arXiv preprint arXiv:2310.06271.
  55. Resprompt: Residual connection prompting advances multi-step reasoning in large language models. arXiv preprint arXiv:2310.04743.
  56. Forward-backward reasoning in large language models for verification. CoRR, abs/2308.07758.
  57. Design of chain-of-thought in math problem solving.
  58. Ziqi Jin and Wei Lu. 2023. Tab-cot: Zero-shot tabular chain of thought. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 10259–10277. Association for Computational Linguistics.
  59. Teaching language models to hallucinate less with synthetic tasks. arXiv preprint arXiv:2310.06827.
  60. Mrkl systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning. ArXiv, abs/2205.00445.
  61. Inferring implicit relations in complex questions with language models. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 2548–2566. Association for Computational Linguistics.
  62. Discriminator-guided multi-step reasoning with language models. arXiv preprint arXiv:2305.14934.
  63. Decomposed prompting: A modular approach for solving complex tasks. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  64. Large language models are zero-shot reasoners. In NeurIPS.
  65. Parsing algebraic word problems into equations. Transactions of the Association for Computational Linguistics, 3:585–597.
  66. MAWPS: A math word problem repository. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, pages 1152–1157. The Association for Computational Linguistics.
  67. Can language models learn from explanations in context? In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 537–563. Association for Computational Linguistics.
  68. A continual learning survey: Defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell., 44(7):3366–3385.
  69. Measuring faithfulness in chain-of-thought reasoning. CoRR, abs/2307.13702.
  70. Soochan Lee and Gunhee Kim. 2023. Recursion of thought: A divide-and-conquer approach to multi-context reasoning with language models. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 623–658. Association for Computational Linguistics.
  71. Boosting logical reasoning in large language models through a new framework: The graph of thought. CoRR, abs/2308.08614.
  72. Chain of natural language inference for reducing large language model ungrounded hallucinations. arXiv preprint arXiv:2310.03951.
  73. What is more likely to happen next? video-and-language future event prediction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pages 8769–8784. Association for Computational Linguistics.
  74. Solving quantitative reasoning problems with language models. In NeurIPS.
  75. From representation to reasoning: Towards both evidence and commonsense reasoning for video question-answering. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 21241–21250. IEEE.
  76. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 19730–19742. PMLR.
  77. Symbolic chain-of-thought distillation: Small models can also "think" step-by-step. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 2665–2679. Association for Computational Linguistics.
  78. Api-bank: A benchmark for tool-augmented llms. ArXiv, abs/2304.08244.
  79. Contrastive decoding: Open-ended text generation as optimization. In Annual Meeting of the Association for Computational Linguistics.
  80. Xiaonan Li and Xipeng Qiu. 2023. Mot: Pre-thinking and recalling enable chatgpt to self-improve with memory-of-thoughts. CoRR, abs/2305.05181.
  81. Chain of knowledge: A framework for grounding large language models with structured knowledge bases. CoRR, abs/2305.13269.
  82. Making language models better reasoners with step-aware verifier. In Annual Meeting of the Association for Computational Linguistics.
  83. Dissecting chain-of-thought: A study on compositional in-context learning of mlps. CoRR, abs/2305.18869.
  84. Program induction by rationale generation: Learning to solve and explain algebraic word problems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, Canada, July 30 - August 4, Volume 1: Long Papers, pages 158–167. Association for Computational Linguistics.
  85. Deductive verification of chain-of-thought reasoning. CoRR, abs/2306.03872.
  86. Llm+p: Empowering large language models with optimal planning proficiency.
  87. Crystal: Introspective reasoners reinforced with self-feedback. arXiv preprint arXiv:2310.04921.
  88. Logiqa: A challenge dataset for machine reading comprehension with logical reasoning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pages 3622–3628. ijcai.org.
  89. Jieyi Long. 2023. Large language model guided tree-of-thought. CoRR, abs/2305.08291.
  90. Chain-of-dictionary prompting elicits translation in large language models. CoRR, abs/2305.06575.
  91. Learn to explain: Multimodal reasoning via thought chains for science question answering. In NeurIPS.
  92. Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  93. Gear: Augmenting language models with generalizable and efficient tool resolution.
  94. Faithful chain-of-thought reasoning. CoRR, abs/2301.13379.
  95. Self-refine: Iterative refinement with self-feedback. CoRR, abs/2303.17651.
  96. Aman Madaan and Amir Yazdanbakhsh. 2022. Text and patterns: For effective chain of thought, it takes two to tango. CoRR, abs/2209.07686.
  97. Language models of code are few-shot commonsense learners. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 1384–1403. Association for Computational Linguistics.
  98. Teaching small language models to reason. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 1773–1781. Association for Computational Linguistics.
  99. William Merrill and Ashish Sabharwal. 2023. The expresssive power of transformers with chain of thought.
  100. Selfcheck: Using llms to zero-shot check their own step-by-step reasoning. arXiv preprint arXiv:2308.00436.
  101. A diverse corpus for evaluating and developing English math word problem solvers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 975–984, Online. Association for Computational Linguistics.
  102. Can a suit of armor conduct electricity? a new dataset for open book question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2381–2391, Brussels, Belgium. Association for Computational Linguistics.
  103. LILA: A unified benchmark for mathematical reasoning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 5807–5832. Association for Computational Linguistics.
  104. Numglue: A suite of fundamental yet challenging mathematical reasoning tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 3505–3523. Association for Computational Linguistics.
  105. Shentong Mo and Miao Xin. 2023. Tree of uncertain thoughts reasoning for large language models. CoRR, abs/2309.07694.
  106. Diversity of thought improves reasoning abilities of large language models. arXiv preprint arXiv:2310.07088.
  107. Skeleton-of-thought: Large language models can do parallel decoding. CoRR, abs/2307.15337.
  108. Sean O’Brien and Mike Lewis. 2023. Contrastive decoding improves reasoning in large language models. ArXiv, abs/2309.09117.
  109. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  110. Art: Automatic multi-step reasoning and tool-use for large language models.
  111. Talm: Tool augmented language models. ArXiv, abs/2205.12255.
  112. TALM: tool augmented language models. CoRR, abs/2205.12255.
  113. Visualcomet: Reasoning about the dynamic context of a still image. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V, volume 12350 of Lecture Notes in Computer Science, pages 508–524. Springer.
  114. Are NLP models really able to solve simple math word problems? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pages 2080–2094. Association for Computational Linguistics.
  115. REFINER: reasoning feedback on intermediate representations. CoRR, abs/2304.01904.
  116. Kosmos-2: Grounding multimodal large language models to the world. CoRR, abs/2306.14824.
  117. Boosted prompt ensembles for large language models. CoRR, abs/2304.05970.
  118. Reasoning with language model prompting: A survey. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 5368–5393. Association for Computational Linguistics.
  119. Alec Radford and Karthik Narasimhan. 2018. Improving language understanding by generative pre-training.
  120. Question decomposition improves the faithfulness of model-generated reasoning. CoRR, abs/2307.11768.
  121. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1).
  122. Event2mind: Commonsense inference on events, intents, and reactions. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, pages 463–473. Association for Computational Linguistics.
  123. Subhro Roy and Dan Roth. 2015. Solving general arithmetic word problems. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1743–1752, Lisbon, Portugal. Association for Computational Linguistics.
  124. Tptu: Task planning and tool usage of large language model-based ai agents.
  125. Language models are greedy reasoners: A systematic formal analysis of chain-of-thought. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  126. BLOOM: A 176b-parameter open-access multilingual language model. CoRR, abs/2211.05100.
  127. Are emergent abilities of large language models a mirage? CoRR, abs/2304.15004.
  128. Toolformer: Language models can teach themselves to use tools. CoRR, abs/2302.04761.
  129. Algorithm of thoughts: Enhancing exploration of ideas in large language models. CoRR, abs/2308.10379.
  130. Synthetic prompting: Generating chain-of-thought demonstrations for large language models. CoRR, abs/2302.00618.
  131. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. ArXiv, abs/2303.17580.
  132. Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  133. Reflexion: Language agents with verbal reinforcement learning.
  134. Screws: A modular framework for reasoning with revisions. arXiv preprint arXiv:2309.13075.
  135. Automatic prompt augmentation and selection with chain-of-thought from labeled data. CoRR, abs/2302.12822.
  136. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. CoRR, abs/2206.04615.
  137. Adaplanner: Adaptive planning from feedback with language models. ArXiv, abs/2305.16653.
  138. Challenging big-bench tasks and whether chain-of-thought can solve them. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 13003–13051. Association for Computational Linguistics.
  139. Proofwriter: Generating implications, proofs, and abductive statements over natural language. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, volume ACL/IJCNLP 2021 of Findings of ACL, pages 3621–3634. Association for Computational Linguistics.
  140. Commonsenseqa: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4149–4158. Association for Computational Linguistics.
  141. Commonsenseqa 2.0: Exposing the limits of AI through gamification. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual.
  142. Large language models are in-context semantic reasoners rather than symbolic reasoners. CoRR, abs/2305.14825.
  143. Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
  144. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  145. Better zero-shot reasoning with self-adaptive prompting. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 3493–3514. Association for Computational Linguistics.
  146. Iteratively prompt pre-trained language models for chain of thought. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 2714–2730. Association for Computational Linguistics.
  147. Towards understanding chain-of-thought prompting: An empirical study of what matters. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 2717–2739. Association for Computational Linguistics.
  148. Does it make sense? and why? a pilot study for sense making and explanation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4020–4026, Florence, Italy. Association for Computational Linguistics.
  149. Boosting language models reasoning with chain-of-knowledge prompting. CoRR, abs/2306.06427.
  150. Knowledge-driven cot: Exploring faithful reasoning in llms for knowledge-intensive question answering.
  151. T-sciq: Teaching multimodal chain-of-thought reasoning via large language model signals for science question answering. CoRR, abs/2305.03453.
  152. A survey on large language model based autonomous agents. CoRR, abs/2308.11432.
  153. Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 2609–2634. Association for Computational Linguistics.
  154. A comprehensive survey of continual learning: Theory, method and application. CoRR, abs/2302.00487.
  155. Scott: Self-consistent chain-of-thought distillation. In Annual Meeting of the Association for Computational Linguistics.
  156. Image as a foreign language: Beit pretraining for all vision and vision-language tasks. CoRR, abs/2208.10442.
  157. Guiding language model reasoning with planning tokens.
  158. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  159. Element-aware summarization with large language models: Expert-aligned evaluation and chain-of-thought method. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 8640–8665. Association for Computational Linguistics.
  160. Emergent abilities of large language models. Trans. Mach. Learn. Res., 2022.
  161. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
  162. Large language models are reasoners with self-verification. arXiv preprint arXiv:2212.09561.
  163. STAR: A benchmark for situated reasoning in real-world videos. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual.
  164. Analyzing chain-of-thought prompting in large language models via gradient-based feature attributions. CoRR, abs/2307.13339.
  165. The rise and potential of large language model based agents: A survey. CoRR, abs/2309.07864.
  166. Next-qa: Next phase of question-answering to explaining temporal actions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 9777–9786. Computer Vision Foundation / IEEE.
  167. Reprompting: Automated chain-of-thought prompt inference through gibbs sampling.
  168. RCOT: detecting and rectifying factual inconsistency in reasoning by reversing chain-of-thought. CoRR, abs/2305.11499.
  169. MM-REACT: prompting chatgpt for multimodal reasoning and action. CoRR, abs/2303.11381.
  170. Language models as inductive reasoners. CoRR, abs/2212.10923.
  171. Thinking like an expert:multimodal hypergraph-of-thought (hot) reasoning to boost foundation modals.
  172. Tree of thoughts: Deliberate problem solving with large language models. CoRR, abs/2305.10601.
  173. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  174. Beyond chain-of-thought, effective graph-of-thought reasoning in large language models. CoRR, abs/2305.16582.
  175. The unreliability of explanations in few-shot in-context learning. CoRR, abs/2205.03401.
  176. Explanation selection using unlabeled data for in-context learning. CoRR, abs/2302.04813.
  177. Large language models are versatile decomposers: Decomposing evidence and questions for table-based reasoning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023, pages 174–184. ACM.
  178. CLEVRER: collision events for video representation and reasoning. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  179. Answering questions by meta-reasoning over multiple chains of thought. CoRR, abs/2304.13007.
  180. Nature language reasoning, A survey. CoRR, abs/2303.14725.
  181. Thought propagation: An analogical approach to complex reasoning with large language models. arXiv preprint arXiv:2310.03965.
  182. Reclor: A reading comprehension dataset requiring logical reasoning. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  183. Improving math word problems with pre-trained knowledge and hierarchical reasoning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3384–3394, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  184. Learning from inside: Self-driven siamese sampling and reasoning for video question answering. Advances in Neural Information Processing Systems, 34:26462–26474.
  185. Towards better chain-of-thought prompting strategies: A survey.
  186. Star: Bootstrapping reasoning with reasoning. In NeurIPS.
  187. From recognition to cognition: Visual commonsense reasoning. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 6720–6731. Computer Vision Foundation / IEEE.
  188. Cot-bert: Enhancing unsupervised sentence representation through chain-of-thought. CoRR, abs/2309.11143.
  189. Hugh Zhang and David C. Parkes. 2023. Chain-of-thought reasoning is a policy improvement operator.
  190. Draft & verify: Lossless large language model acceleration via self-speculative decoding. arXiv preprint arXiv:2309.08168.
  191. Exploring the curious case of code prompts. CoRR, abs/2304.13250.
  192. How language model hallucinations can snowball. CoRR, abs/2305.13534.
  193. A dataset and benchmark for automatically answering and generating machine learning final exams. CoRR, abs/2206.05442.
  194. Natural language embedded programs for hybrid language symbolic reasoning. arXiv preprint arXiv:2309.10814.
  195. Zhuosheng Zhang and Aston Zhang. 2023. You only look at screens: Multimodal chain-of-action agents.
  196. Automatic chain of thought prompting in large language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  197. Multimodal chain-of-thought reasoning in language models. CoRR, abs/2302.00923.
  198. Verify-and-edit: A knowledge-enhanced chain-of-thought framework. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 5823–5840. Association for Computational Linguistics.
  199. Jiuzhang: A chinese pre-trained language model for mathematical problem understanding. In KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022, pages 4571–4581. ACM.
  200. A survey of large language models. CoRR, abs/2303.18223.
  201. Enhancing zero-shot chain-of-thought reasoning in large language models through logic. CoRR, abs/2309.13339.
  202. Take a step back: Evoking reasoning via abstraction in large language models. arXiv preprint arXiv:2310.06117.
  203. Language agent tree search unifies reasoning acting and planning in language models.
  204. "going on a vacation" takes longer than "going for a walk": A study of temporal commonsense understanding. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 3361–3367. Association for Computational Linguistics.
  205. Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  206. Isr-llm: Iterative self-refined large language model for long-horizon sequential task planning.
  207. TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pages 3277–3287. Association for Computational Linguistics.
  208. Meta-cot: Generalizable chain-of-thought prompting in mixed-task scenarios with large language models. arXiv preprint arXiv:2310.06692.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Zheng Chu (49 papers)
  2. Jingchang Chen (10 papers)
  3. Qianglong Chen (25 papers)
  4. Weijiang Yu (23 papers)
  5. Tao He (62 papers)
  6. Haotian Wang (60 papers)
  7. Weihua Peng (12 papers)
  8. Ming Liu (421 papers)
  9. Bing Qin (186 papers)
  10. Ting Liu (329 papers)
Citations (97)
Github Logo Streamline Icon: https://streamlinehq.com