How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis (2411.04105v3)

Published 6 Nov 2024 in cs.LG, cs.AI, and cs.CL

Abstract: LLMs have shown amazing performance on tasks that require planning and reasoning. Motivated by this, we investigate the internal mechanisms that underpin a network's ability to perform complex logical reasoning. We first construct a synthetic propositional logic problem that serves as a concrete test-bed for network training and evaluation. Crucially, this problem demands nontrivial planning to solve. We perform our study on two fronts. First, we pursue an understanding of precisely how a three-layer transformer, trained from scratch and attains perfect test accuracy, solves this problem. We are able to identify certain "planning" and "reasoning" mechanisms in the network that necessitate cooperation between the attention blocks to implement the desired logic. Second, we study how pretrained LLMs, namely Mistral-7B and Gemma-2-9B, solve this problem. We characterize their reasoning circuits through causal intervention experiments, providing necessity and sufficiency evidence for the circuits. We find evidence suggesting that the two models' latent reasoning strategies are surprisingly similar, and human-like. Overall, our work systemically uncovers novel aspects of small and large transformers, and continues the study of how they plan and reason.

Authors (6)

Guan Zhe Hong (4 papers)
Nishanth Dikkala (26 papers)
Enming Luo (11 papers)
Cyrus Rashtchian (31 papers)
Rina Panigrahy (34 papers)
Xin Wang (1307 papers)

Summary

Mechanistic Insights into Transformer Reasoning on Propositional Logic Problems

The paper presents a thorough investigation into the mechanisms that allow transformer models to solve propositional logic problems, aiming to elucidate the roles of individual components within these models. By utilizing a synthetic propositional logic dataset, the authors conduct mechanistic analyses on both small GPT-2-like transformers and larger, pre-trained models like Mistral-7B. This research provides crucial insights into how transformers internalize and operationalize reasoning pathways for logical tasks.

The paper begins by developing a synthetic dataset designed to reflect minimal propositions requiring reasoning and planning, represented through a series of logical rules and facts. Through a well-defined experimental setup, the authors aim to analyze the internal mechanisms facilitating the logical reasoning within transformers.

Key Findings

Small Transformer Analysis:
- Within the small transformer models, particularly those resembling GPT-2 architectures, the authors identify specific embeddings, referred to as "routing embeddings," which notably influence information flow through the network layers. The paper reveals that the task type (e.g., querying a logical OR vs. a linear causal chain) alters the effect of these embeddings on model behavior.
- The model exhibits defined reasoning pathways where the interaction of attention layers plays a pivotal role, indicating that logical tasks often necessitate coordination across layers.
Mistral-7B Analysis:
- The larger Mistral-7B model showcased specialized roles for attention heads when solving the minimal reasoning problems. The paper delineates four distinct families of attention heads: queried-rule locating heads, queried-rule mover heads, fact processing heads, and decision heads.
- The attention within the model follows a logical reasoning pathway of "QUERY→Relevant Rule→Relevant Fact(s)→Decision," demonstrating a coherent reasoning circuit ingrained within the architecture.

Methodological Approach

The researchers employ activation patching, a technique involving causal interventions, to manipulate and observe specific components within the transformer network. This approach is critical to understanding how modifications at various layers and positions impact the overall reasoning ability of the models.

By categorizing attention heads into functional groups, the research uncovers significant roles of particular heads at various stages of the reasoning process, from identifying rules and facts to making final reasoning decisions. This fine-grained analysis provides empirical backing for theoretical claims about hierarchical processing within transformer models.

Implications and Future Research

The findings have profound implications for both theoretical and practical applications of transformers. The delineation of specific reasoning pathways and component roles enhances our understanding of how transformers process complex reasoning tasks. It provides a foundation for designing interpretable models tailored for logical reasoning and complex problem-solving.

Future research directions may include expanding the analysis to more complex reasoning problems and exploring whether similar reasoning circuits are employed in solving a wider variety of tasks, such as those in the domain of mathematical problem-solving (GSM-like problems). An additional avenue could involve assessing the applicability of these insights to alternative architectures, such as mixture-of-experts models like Mistral 8x7B.

In conclusion, this paper significantly enhances comprehension of mechanistic interpretability within transformers, specifically focusing on logic-based tasks. The insights gained usher in promising avenues for researchers and engineers looking to optimize, design, and interpret models for sophisticated reasoning capabilities.

PDF Markdown