Graph-of-Thought: Enhancing Reasoning in LLMs with Non-Linear Structures
The paper "Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in LLMs" critically examines the limitations of Chain-of-Thought (CoT) reasoning and introduces a novel model called Graph-of-Thought (GoT). While CoT has been instrumental in extracting intermediate reasoning steps from LLMs (LMs), it inherently assumes a linear progression of thoughts, which may not adequately capture the complex, non-linear nature of human cognition. This paper proposes modeling thoughts as a graph to achieve a more nuanced representation, enhancing the reasoning abilities of LMs.
Methodology and Implementation
The proposed GoT framework represents thought units as nodes and their interconnections as edges, effectively forming a graph that reflects the intricate web of human thinking. The architecture consists of a two-stage framework: rationale generation and answer generation. In the first stage, a GoT encoder constructs a thought graph from text inputs and multimodal data when available. This graph aids in generating explanatory rationales leading to an answer. In the second stage, the model refines its answer-generation process using the outputs from the first stage.
The GoT model utilizes a Graph Attention Network (GAT) to encode the constructed thought graph while employing transformers for text and vision encoding. The fused multi-modal representations are processed through a gated fusion mechanism to improve alignment and integration.
Results
The empirical evaluation presents strong performance improvements attributed to GoT. On the AQUA-RAT dataset, for instance, the GoT model exhibited a significant ROUGE-L improvement in rationale generation and outperformed strong CoT baselines, achieving an accuracy increase from 30.09% to 32.09% with T5-base and from 33.73% to 34.48% with T5-large. Furthermore, in the multimodal ScienceQA benchmark, GoT exceeded the accuracy of the Multimodal-CoT baseline by 2.40%, showcasing the effectiveness of integrating graph-like reasoning processes.
Implications and Future Work
The paper's results indicate that the graph-based approach offers a more realistic modeling of reasoning processes, potentially paving the way for substantial advancements in both text-based and multimodal frameworks. By capturing the jumping, associative nature of human thoughts, GoT can contribute significantly to the evolution of cognitive modeling in LMs.
For future developments, researchers might consider expanding GoT to broader applications, exploring enhanced fusion mechanisms, or refining graph construction methods for even more precise reasoning abilities. Additionally, an exploration of how GoT can be scaled in conjunction with larger LLMs is imperative to push the boundaries of AI reasoning.
This paper illuminates the promise of non-linear reasoning structures in LMs, making significant strides in computational modeling that closer mirror human cognitive processes. The high-performance gains underscore GoT’s potential as a transformative tool in advancing artificial intelligence capabilities.