Overview
This paper introduces a novel approach to understanding the reasoning abilities of pre-trained LLMs (LMs), specifically focusing on how LLMs can perform complex reasoning tasks without explicit fine-tuning. The authors propose viewing LMs as systems that aggregate indirect reasoning paths seen during pre-training, applying this perspective to two key areas: logic reasoning with knowledge graphs (KGs) and math reasoning with math word problems (MWPs).
Reasoning Paths Aggregation
The core hypothesis is that LMs are capable of generating new conclusions by aggregating reasoning paths encountered during pre-training. This hypothesis was tested in scenarios involving KGs and MWPs, with LMs formalizing reasoning paths as random walks on knowledge/reasoning graphs. The paper demonstrates that a weighted sum of relevant random walk path probabilities can explain how LMs reason, suggesting that training on random walk paths enhances real-world multi-step reasoning performance.
Logical Reasoning Analysis
The investigation begins with logical reasoning over KGs. The paper details an experiment involving the training of a small Transformer model on random walk paths derived from KGs. It was found that the LM's distribution closely resembles a weighted aggregation of possible random walk paths, indicating LMs can effectively perform reasoning tasks by weighting logical rules optimally. This result was supported by analyses showing an optimal random walk path length for effective reasoning, bolstered by further experiments showing improved reasoning capabilities with augmented unlabeled random walk reasoning paths.
Mathematical Reasoning Expansion
The paper extends its findings to math reasoning, focusing on the challenge of solving MWPs. The research methodology involves continuing the training of a pre-trained base LM with random walk reasoning paths generated from existing Chain of Thought (CoT) training data. The results reveal consistent improvements over vanilla supervised fine-tuning techniques for MWPs, affirming the paper's hypothesis that LMs utilize and benefit from an aggregation of random walk reasoning paths.
Implications and Future Work
The findings have significant implications for both academic research and practical applications in AI. Understanding how LMs can harness pre-training data to enhance reasoning abilities could lead to more advanced AI systems capable of complex problem-solving. The research suggests potential for further explorations into how different data augmentation techniques, specifically around random walk reasoning paths, can further refine and improve the performance of LMs in reasoning tasks.
Conclusion
This paper provides insightful analyses and robust empirical evidence supporting the hypothesis that LMs can aggregate indirect reasoning paths to enhance their reasoning capabilities. By dissecting the reasoning ability of LMs from the perspective of reasoning paths aggregation, the authors offer a compelling framework that not only sheds light on how these models learn but also opens pathways for future research aimed at optimizing LM pre-training and fine-tuning processes for advanced reasoning tasks.