Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data? (2501.15857v7)

Published 27 Jan 2025 in cs.AI, cs.CL, and cs.LG

Abstract: Humans exhibit remarkable compositional reasoning by integrating knowledge from various sources. For example, if someone learns ( B = f(A) ) from one source and ( C = g(B) ) from another, they can deduce ( C=g(B)=g(f(A)) ) even without encountering ( ABC ) together, showcasing the generalization ability of human intelligence. In this paper, we introduce a synthetic learning task, "FTCT" (Fragmented at Training, Chained at Testing), to validate the potential of Transformers in replicating this skill and interpret its inner mechanism. In the training phase, data consist of separated knowledge fragments from an overall causal graph. During testing, Transformers must infer complete causal graph traces by integrating these fragments. Our findings demonstrate that few-shot Chain-of-Thought prompting enables Transformers to perform compositional reasoning on FTCT by revealing correct combinations of fragments, even if such combinations were absent in the training data. Furthermore, the emergence of compositional reasoning ability is strongly correlated with the model complexity and training-testing data similarity. We propose, both theoretically and empirically, that Transformers learn an underlying generalizable program from training, enabling effective compositional reasoning during testing.

PDF Abstract

Assessing Compositional Reasoning Abilities of Transformers in FTCT-based Tasks

The research presented in "Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?" investigates the ability of Transformer models to perform compositional reasoning. This capability is characterized by the ability to integrate separate fragments of knowledge observed during training into coherent reasoning chains during testing. Traditional human cognition exhibits this skill, which the researchers aim to emulate using artificial models. The authors introduce a synthetic benchmark called "FTCT" (Fragmented at Training, Chained at Testing) specifically created to explore this potential in Transformers.

Research Questions and Dataset

The research addresses three primary questions: (1) When can Transformers effectively connect fragmented knowledge during training to form a reasoning chain during testing? (2) How do factors involved in training affect the development of compositional reasoning? (3) Which internal mechanisms permit such reasoning abilities in Transformers? To explore these questions, the authors propose FTCT. This dataset relies on causal graph structures where knowledge points are represented as vertices and the interrelations as edges. The key challenge lies in concatenating fragmented training data into a coherent chain during testing, without overtly being exposed to these target sequences during the model’s learning phase.

Methodology and Findings

The paper utilizes Transformer models, leveraging few-shot Chain-of-Thought (CoT) prompts to test and enhance the model’s compositional reasoning performance. Experiments show that few-shot prompts substantially improve reasoning capabilities compared to zero-shot scenarios, underscoring CoT's impact on enhancing the model's cognitive emulation. Testing values accuracy remains optimal throughout, showing that Transformers can infer correct contextual values from previously unobserved reasoning sequences.

A critical finding is that compositional reasoning emerges when the relative knowledge ratio—defined as the length of child chains in training relative to the full chain length in testing—exceeds 0.3. Furthermore, the analysis suggests that at least two Transformer layers and heads are requisite for achieving substantial compositional reasoning abilities, pointing towards the significance of the model architecture and data distribution.

Theoretical Justification and Mechanisms

From a theoretical standpoint, the authors delineate how Transformers can simulate an underlying program capable of minimizing both training and testing losses, rooted in foundational generalization and representational properties of these models. This program encapsulates in-context learning and parent retrieval mechanisms. The empirical investigation shows that "induction heads"—a pair of attention heads operating across layers to capture positional and contextual information—play a central role in realizing this capability. Attention patterns reveal that Transformers adeptly 'learn' the in-context sequence ordering through exemplars provided in CoT prompts and subsequently generalize these to a novel formation of chained reasoning during testing.

Additionally, the research employs linear probing to uncover that parent retrieval is facilitated by precise attention assignment. This finding supports the hypothesis that Transformer models encode and utilize contextual information on vertices to draw upon relevant parental structures effectively, even amidst the fragmented training context.

Conclusion and Implications

The paper concludes that few-shot CoT prompting significantly enhances a Transformer’s ability to perform compositional reasoning in synthetically controlled contexts. The insights provided offer valuable perspectives into the internal functions and structural necessities of models like Transformers when applied to reasoning tasks analogous to human deductive processes. While the FTCT benchmark specifically evaluates compositional reasoning in a controlled environment, the findings here may lay the groundwork for understanding how LLMs can be trained to execute complex, systematic reasoning tasks in real-world scenarios.

PDF Markdown Bookmark Chat (Pro)

Authors (2)

Yutong Yin (3 papers)
Zhaoran Wang (164 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/YutongYin774638/status/1918219725064020316

YouTube

Show All Videos