How to think step-by-step: A mechanistic understanding of chain-of-thought reasoning (2402.18312v2)

Published 28 Feb 2024 in cs.CL and cs.LG

Abstract: Despite superior reasoning prowess demonstrated by LLMs with Chain-of-Thought (CoT) prompting, a lack of understanding prevails around the internal mechanisms of the models that facilitate CoT generation. This work investigates the neural sub-structures within LLMs that manifest CoT reasoning from a mechanistic point of view. From an analysis of Llama-2 7B applied to multistep reasoning over fictional ontologies, we demonstrate that LLMs deploy multiple parallel pathways of answer generation for step-by-step reasoning. These parallel pathways provide sequential answers from the input question context as well as the generated CoT. We observe a functional rift in the middle layers of the LLM. Token representations in the initial half remain strongly biased towards the pretraining prior, with the in-context prior taking over in the later half. This internal phase shift manifests in different functional components: attention heads that write the answer token appear in the later half, attention heads that move information along ontological relationships appear in the initial half, and so on. To the best of our knowledge, this is the first attempt towards mechanistic investigation of CoT reasoning in LLMs.

PDF Abstract

Mechanistic Insights Into Chain-of-Thought Reasoning in LLMs

The paper undertakes a mechanistic dissection of Chain-of-Thought (CoT) reasoning within LLMs, particularly focusing on a detailed investigation of Llama-2 7B. By considering CoT reasoning on multi-step tasks involving fictional ontologies, the paper moves beyond the traditional backdrop of empirical and theoretical explorations, diverting attention to the neural sub-structures that LLMs deploy during the CoT generation process.

Key Observations and Methodologies

The paper introduces a novel perspective on CoT by dissecting the reasoning into subtasks, which encompass decision-making, copying, and induction. The authors assert that such fine granularity of tasks provides an optimal lens to examine the internal workings of LLMs. Employing the PrOntoQA dataset, the researchers highlight how LLMs generate CoT responses by executing composite tasks entailing decision-making, copying, and inductive reasoning.

The application of interpretative techniques, including activation patching, probing classifiers, and the logit lens, reveals a substantial intersection of attention heads for various subtasks, suggesting intertwined circuits mimicking induction head mechanisms. Contrary to the notion of distinct reasoning circuits for each subtask, this work delineates a conglomerate of algorithmic pathways that simultaneously address different reasoning needs.

Mechanistically, the investigation employs a recursive strategy to trace information flow through various layers of the model, pinpointing residual streams and answer-writing heads. Notably, it illustrates multiple pathways operating in parallel to procure answers from different segments of the input, accentuating how LLMs utilize both pretrained and in-context priors. This extensive path exploration underscores the emergence of such pathways from a specialized distribution of attention heads that operate beyond straightforward preemptive reasoning.

Functional Rift and Information Movement

An intriguing contribution of this paper is the identification of a functional rift within the Llama-2 7B model. The model exhibits a phase shift in its reasoning mechanism post the 16th decoder block, marking a demarcation between pretraining priori and in-context reasoning capacity. This finding is considerable as it suggests a systematic progression from embedding to unembedding procedures — initially aligning residual streams with ontological task information before solidifying contextual association post-transition zone.

The utilization of probing mechanisms further elucidates how entities involved in ontological relationships present distinguishable residual stream pairings, with noticeable token mixing facilitated across layers. Particularly, entity-relatedness is gradually dictated by non-linear operations embedding ontological relations inherently absent in prenetwork training datasets. These mechanisms serve to amplify in-context learning dynamics without premature contextual encoding, thus delineating a crucial aspect of emergent neural circuitry responsive to CoT tasks.

Implications and Prospective Research Directions

The paper's findings have profound implications for future advancements in mechanistic interpretability and LLM design. By explicating the inherent structural dynamics and the interaction between pretraining biases and contextual adaptations, this research paves the way for more nuanced understandings of LLM functionality across variable reasoning demands. Its identification of multiple neural pathways for answer procurement offers crucial insights into developing mechanisms to modulate CoT reliability, robustness, and adaptability.

Envisioning prospective avenues, it underscores the necessity of extending such mechanistic insights to models constrained by free-form reasoning tasks. Additionally, the observation of shared heads across tasks calls for meticulous attention to the roles of MLP layers in reasoning beyond their factual memorization prowess. Moreover, encapsulating these findings within larger-scale models and exploring cross-layer emergent properties could enhance the predictability of training dynamics and circuit interpretability.

In summation, this paper offers a profound mechanistic perspective into CoT reasoning, significantly contributing to the field by bridging theoretical, empirical, and structural explorations of LLMs. The specialized scrutiny of Llama-2 7B provides a robust framework to analyze and harness the adaptive dynamics of LLMs developing socio-dynamic reasoning capabilities amidst incomplete or novel information inputs.