Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models (2305.16582v2)

Published 26 May 2023 in cs.CL

Abstract: With the widespread use of LLMs (LMs) in NLP tasks, researchers have discovered the potential of Chain-of-thought (CoT) to assist LMs in accomplishing complex reasoning tasks by generating intermediate steps. However, human thought processes are often non-linear, rather than simply sequential chains of thoughts. Therefore, we propose Graph-of-Thought (GoT) reasoning, which models human thought processes not only as a chain but also as a graph. By representing thought units as nodes and connections between them as edges, our approach captures the non-sequential nature of human thinking and allows for a more realistic modeling of thought processes. GoT adopts a two-stage framework with an additional GoT encoder for thought graph representation and fuses the graph representation with the original input representation through a gated fusion mechanism. We evaluate GoT's performance on a text-only reasoning task (AQUA-RAT) and a multimodal reasoning task (ScienceQA). Our model achieves significant improvement over the strong CoT baseline on the AQUA-RAT test set and boosts accuracy from 85.19% to 87.59% using the T5-base model over the state-of-the-art Multimodal-CoT on the ScienceQA test set.

PDF Abstract

Graph-of-Thought: Enhancing Reasoning in LLMs with Non-Linear Structures

The paper "Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in LLMs" critically examines the limitations of Chain-of-Thought (CoT) reasoning and introduces a novel model called Graph-of-Thought (GoT). While CoT has been instrumental in extracting intermediate reasoning steps from LLMs (LMs), it inherently assumes a linear progression of thoughts, which may not adequately capture the complex, non-linear nature of human cognition. This paper proposes modeling thoughts as a graph to achieve a more nuanced representation, enhancing the reasoning abilities of LMs.

Methodology and Implementation

The proposed GoT framework represents thought units as nodes and their interconnections as edges, effectively forming a graph that reflects the intricate web of human thinking. The architecture consists of a two-stage framework: rationale generation and answer generation. In the first stage, a GoT encoder constructs a thought graph from text inputs and multimodal data when available. This graph aids in generating explanatory rationales leading to an answer. In the second stage, the model refines its answer-generation process using the outputs from the first stage.

The GoT model utilizes a Graph Attention Network (GAT) to encode the constructed thought graph while employing transformers for text and vision encoding. The fused multi-modal representations are processed through a gated fusion mechanism to improve alignment and integration.

Results

The empirical evaluation presents strong performance improvements attributed to GoT. On the AQUA-RAT dataset, for instance, the GoT model exhibited a significant ROUGE-L improvement in rationale generation and outperformed strong CoT baselines, achieving an accuracy increase from 30.09% to 32.09% with T5-base and from 33.73% to 34.48% with T5-large. Furthermore, in the multimodal ScienceQA benchmark, GoT exceeded the accuracy of the Multimodal-CoT baseline by 2.40%, showcasing the effectiveness of integrating graph-like reasoning processes.

Implications and Future Work

The paper's results indicate that the graph-based approach offers a more realistic modeling of reasoning processes, potentially paving the way for substantial advancements in both text-based and multimodal frameworks. By capturing the jumping, associative nature of human thoughts, GoT can contribute significantly to the evolution of cognitive modeling in LMs.

For future developments, researchers might consider expanding GoT to broader applications, exploring enhanced fusion mechanisms, or refining graph construction methods for even more precise reasoning abilities. Additionally, an exploration of how GoT can be scaled in conjunction with larger LLMs is imperative to push the boundaries of AI reasoning.

This paper illuminates the promise of non-linear reasoning structures in LMs, making significant strides in computational modeling that closer mirror human cognitive processes. The high-performance gains underscore GoT’s potential as a transformative tool in advancing artificial intelligence capabilities.