A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning (2411.04105v4)

Published 6 Nov 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Due to the size and complexity of modern LLMs, it has proven challenging to uncover the underlying mechanisms that models use to solve reasoning problems. For instance, is their reasoning for a specific problem localized to certain parts of the network? Do they break down the reasoning problem into modular components that are then executed as sequential steps as we go deeper in the model? To better understand the reasoning capability of LLMs, we study a minimal propositional logic problem that requires combining multiple facts to arrive at a solution. By studying this problem on Mistral and Gemma models, up to 27B parameters, we illuminate the core components the models use to solve such logic problems. From a mechanistic interpretability point of view, we use causal mediation analysis to uncover the pathways and components of the LLMs' reasoning processes. Then, we offer fine-grained insights into the functions of attention heads in different layers. We not only find a sparse circuit that computes the answer, but we decompose it into sub-circuits that have four distinct and modular uses. Finally, we reveal that three distinct models -- Mistral-7B, Gemma-2-9B and Gemma-2-27B -- contain analogous but not identical mechanisms.

Summary

The paper reveals that small transformer models employ 'routing embeddings' and coordinated attention layers to navigate logical reasoning tasks.
It categorizes Mistral-7B’s attention heads into distinct functional groups, outlining a clear QUERY-to-Decision reasoning pathway.
The research uses activation patching to isolate model components, offering actionable insights for creating interpretable reasoning architectures.

Mechanistic Insights into Transformer Reasoning on Propositional Logic Problems

The paper presents a thorough investigation into the mechanisms that allow transformer models to solve propositional logic problems, aiming to elucidate the roles of individual components within these models. By utilizing a synthetic propositional logic dataset, the authors conduct mechanistic analyses on both small GPT-2-like transformers and larger, pre-trained models like Mistral-7B. This research provides crucial insights into how transformers internalize and operationalize reasoning pathways for logical tasks.

The paper begins by developing a synthetic dataset designed to reflect minimal propositions requiring reasoning and planning, represented through a series of logical rules and facts. Through a well-defined experimental setup, the authors aim to analyze the internal mechanisms facilitating the logical reasoning within transformers.

Key Findings

Small Transformer Analysis:
- Within the small transformer models, particularly those resembling GPT-2 architectures, the authors identify specific embeddings, referred to as "routing embeddings," which notably influence information flow through the network layers. The paper reveals that the task type (e.g., querying a logical OR vs. a linear causal chain) alters the effect of these embeddings on model behavior.
- The model exhibits defined reasoning pathways where the interaction of attention layers plays a pivotal role, indicating that logical tasks often necessitate coordination across layers.
Mistral-7B Analysis:
- The larger Mistral-7B model showcased specialized roles for attention heads when solving the minimal reasoning problems. The paper delineates four distinct families of attention heads: queried-rule locating heads, queried-rule mover heads, fact processing heads, and decision heads.
- The attention within the model follows a logical reasoning pathway of "QUERY→Relevant Rule→Relevant Fact(s)→Decision," demonstrating a coherent reasoning circuit ingrained within the architecture.

Methodological Approach

The researchers employ activation patching, a technique involving causal interventions, to manipulate and observe specific components within the transformer network. This approach is critical to understanding how modifications at various layers and positions impact the overall reasoning ability of the models.

By categorizing attention heads into functional groups, the research uncovers significant roles of particular heads at various stages of the reasoning process, from identifying rules and facts to making final reasoning decisions. This fine-grained analysis provides empirical backing for theoretical claims about hierarchical processing within transformer models.

Implications and Future Research

The findings have profound implications for both theoretical and practical applications of transformers. The delineation of specific reasoning pathways and component roles enhances our understanding of how transformers process complex reasoning tasks. It provides a foundation for designing interpretable models tailored for logical reasoning and complex problem-solving.

Future research directions may include expanding the analysis to more complex reasoning problems and exploring whether similar reasoning circuits are employed in solving a wider variety of tasks, such as those in the domain of mathematical problem-solving (GSM-like problems). An additional avenue could involve assessing the applicability of these insights to alternative architectures, such as mixture-of-experts models like Mistral 8x7B.

In conclusion, this paper significantly enhances comprehension of mechanistic interpretability within transformers, specifically focusing on logic-based tasks. The insights gained usher in promising avenues for researchers and engineers looking to optimize, design, and interpret models for sophisticated reasoning capabilities.

PDF Markdown