Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

41 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

How Likely Do LLMs with CoT Mimic Human Reasoning? (2402.16048v3)

Published 25 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Chain-of-thought emerges as a promising technique for eliciting reasoning capabilities from LLMs. However, it does not always improve task performance or accurately represent reasoning processes, leaving unresolved questions about its usage. In this paper, we diagnose the underlying mechanism by comparing the reasoning process of LLMs with humans, using causal analysis to understand the relationships between the problem instruction, reasoning, and the answer in LLMs. Our empirical study reveals that LLMs often deviate from the ideal causal chain, resulting in spurious correlations and potential consistency errors (inconsistent reasoning and answers). We also examine various factors influencing the causal structure, finding that in-context learning with examples strengthens it, while post-training techniques like supervised fine-tuning and reinforcement learning on human feedback weaken it. To our surprise, the causal structure cannot be strengthened by enlarging the model size only, urging research on new techniques. We hope that this preliminary study will shed light on understanding and improving the reasoning process in LLM.

PDF HTML Abstract

Exploring Non-Causal Reasoning in LLMs Through the Lens of Chain-of-Thought

Introduction

LLMs have established prominence across a broad spectrum of complex problem-solving tasks. The adoption of the Chain of Thought (CoT) strategy has been a significant advancement, purportedly enhancing reasoning capabilities in these models. Contrary to the intuitive expectation associating correct CoTs with correct answers, empirical evidence suggests a surprising disjunction between them. Our paper scrutinizes this phenomenon through causal analysis, aiming to unearth the Structural Causal Model (SCM) LLMs implicitly adopt, contrasting it with human reasoning frameworks.

The Non-Causal Link Between CoT and Correct Answers

The examination of LLM performance across various tasks, employing both direct answering and CoT methodologies, unveils an inconsistent impact of CoT on accuracy. Notably, the presence of correct answers following incorrect CoTs, and vice versa, underscores a potential disconnect in the LLMs’ reasoning pathway. This discrepancy casts doubt on the presumption that LLMs engage in sequential causal reasoning analogous to human thought processes.

Demonstrating Causal Relationships in LLM Reasoning

Adopting cause-effect interventions, our analysis significantly contributes to the discourse by identifying varying implied SCMs across distinct tasks and models. Specifically, our findings categorize these causal structures into four definitive types, with "full connection" and "common cause" being prevalent among the examined tasks. This variance underscores the nuance within LLM reasoning capabilities and the influence of factors such as in-context learning and fine-tuning methodologies on the established causal links.

Implications on LLM Training and Reasoning

Our investigation sheds light on the profound impact of commonly deployed techniques in LLM training, including in-context learning, supervised fine-tuning, and reinforcement learning from human feedback. These methods, while proven to enhance task performance, do not consistently fortify the causal relationship between CoT and the resultant answers. This revelation poses essential considerations for future LLM training paradigms and underscores the necessity of adopting mechanisms that genuinely enhance reasoning capabilities.

Future Directions

The dissection of CoT through causal analysis presents a nascent but pivotal approach toward understanding and advancing LLM reasoning faculties. Future explorations might explore the granular aspects of reasoning processes, possibly incorporating other reasoning strategies and considering a broader task scope. Moreover, tweaking LLM training methodologies to align more closely with ideal causal reasoning structures remains an open avenue for research.

Conclusion

In summary, our paper accentuates the intricate landscape of LLM reasoning, marked by non-causal reasoning patterns and influenced by prevalent training techniques. By pioneering the causal analysis of CoT in LLMs, we hope to pave the way for further research aimed at cultivating more reliable and transparent LLM reasoning processes.

PDF Markdown Bookmark Chat (Pro)

References (62)

Authors (5)

Guangsheng Bao (17 papers)
Hongbo Zhang (54 papers)
Linyi Yang (52 papers)
Cunxiang Wang (30 papers)
Yue Zhang (618 papers)

Citations (14)

View on Semantic Scholar

Tweets

https://twitter.com/explorersofai/status/1836751883202855067

https://twitter.com/Muhtasham9/status/1775269108764000284