Graph-of-Thoughts: A New Reasoning Paradigm

Updated 1 January 2026

Graph-of-Thoughts is a framework that represents intermediate LLM reasoning as nodes in a directed graph, allowing flexible branching, merging, and recursive refinement.
Empirical benchmarks show significant gains, with improvements in accuracy and cost efficiency in tasks like algebraic problem solving, multimodal retrieval, and autonomous driving.
GoT algorithms iteratively expand, aggregate, and prune thought nodes using adaptive search strategies, fostering robust, parallel, and reusable problem-solving.

Graph-of-Thoughts (GoT) represents a paradigm shift in LLM reasoning, extending earlier prompting schemes such as Chain-of-Thought (CoT) and Tree-of-Thoughts (ToT) by allowing the intermediate computations of a LLM to be organized as a flexible, explicit, directed graph. Rather than forcing solutions through linear or strictly hierarchical pathways, GoT enables the model to branch, merge, aggregate, and recursively refine its thoughts, mirroring the non-linear and recurrent nature of human and algorithmic reasoning processes. This graph-centric approach has been realized in a diverse set of domains—from algebraic problem solving and multi-hop retrieval to multimodal abstraction generation, autonomous driving, chart question answering, recommendation, reward engineering, and interactive assistants—consistently demonstrating gains in solution quality, cost efficiency, interpretability, and knowledge integration.

1. Formal Foundations and Topology

GoT formalizes reasoning as a directed graph $G = (V, E)$ :

Vertices $V$ : Each node $v \in V$ encodes a “thought”—an intermediate reasoning state, partial solution, extracted fact, or subgoal, expressed as a text payload $\tau(v) \in \Sigma^*$ or embedding $c(v) \in \mathbb{R}^d$ (Besta et al., 2023, Besta et al., 2024).
Edges $E$ : Directed edges $(u,v) \in E$ indicate dependencies, i.e., which thoughts inform the generation of others. Weights or labels $w(u \to v)$ may encode transformation type or confidence (Besta et al., 2023).
Aggregation $\alpha(\cdot)$ : Enables merging multiple upstream thoughts into new synthesis nodes, supporting dynamic programming–style reuse and data fusion.
Scoring and Selection: Each node can carry a numerical score $s(v)$ evaluating its promise or correctness.

GoT generalizes CoT (a path) and ToT (a tree) to arbitrary graphs, supporting both branching (out-degree $>1$ ) and merging (in-degree $>1$ ). Self-loops encode feedback or refinement (Besta et al., 2023, Besta et al., 2024, Yang et al., 2024).

2. Reasoning Algorithms and Construction

GoT frameworks realize computation through an iterative transformation and search process:

Initialization: The graph starts from the root node representing the original input or query.
Expansion: For each “active” node, one or more child thoughts are generated by the LLM using prompt templates representing operations such as “Generate,” “Merge,” “Refine,” or domain-specific operators (e.g., “locate,” “sum,” “compare”) (Besta et al., 2023, Zhang et al., 2024).
Aggregation and Merging: When multiple subpaths reach comparable states, aggregation nodes synthesize their contents via a learned function or direct LLM instruction (Besta et al., 2024).
Scoring and Pruning: Child nodes are evaluated and ranked. Beam search or heuristic-guided traversal may retain only high-scoring thoughts to cap computation (Besta et al., 2023).
Termination: The process halts when a solution node is reached, either by depth, convergence, evaluator score, or explicit halting conditions.

Algorithmic strategies include breadth-first, depth-first, or best-first search across the evolving graph. Adaptive GoT (AGoT) and Dynamic GoT (DGoT) frameworks apply data-driven per-instance criteria for node expansion and stopping, optimizing efficiency and allocating computation where most needed (Pandey et al., 7 Feb 2025, Ning et al., 2024).

3. Domain-Specific Extensions and Architectures

GoT has been extended across multiple research domains, each with tailored graph structures and integration patterns:

Hierarchical Graph of Thoughts (HGOT): Multilayered graphs for retrieval-augmented factuality, decomposing complex queries into subquestions, integrating citation-aware voting and evidence scoring (Fang et al., 2024).
Compositional Reasoning in Multimodal Tasks: Operator-typed nodes (localization, numerical, logical) drive transformer-based reasoning over charts or visual inputs (Zhang et al., 2024).
Multimodal Aggregation: Each reasoning step embeds a subgraph of weighted meta-prompts, with adaptive gating and aggregation (AGoT), interfacing with visual models for robust text-image alignment and VQA (Yang et al., 2024).
Cooperative Autonomous Driving (V2V-GoT): LLM-structured graphs orchestrate perceptual fusion, occlusion-aware sensing, and planning-aware prediction among connected autonomous vehicles (Chiu et al., 22 Sep 2025).
Knowledge Graph of Thoughts (KGoT): Persistent, tool-enhanced knowledge graphs integrate external reasoning steps, web retrieval, and real-time code execution for cost-effective assistant agents (Besta et al., 3 Apr 2025).
Reward Evolution (RE-GoT): Bi-level graphs decompose RL tasks into text-attributed nodes and edges, with iterative LLM/VLM-driven reward synthesis and rollout refinement (Yao et al., 19 Sep 2025).
Sequential Recommendation (GOT4Rec): Parallel GoT subgraphs model short-term, long-term, and collaborative behaviors, with LLM-driven generation and aggregation for improved user-oriented predictions (Long et al., 2024).

4. Comparative Analysis: GoT vs CoT and ToT

GoT subsumes CoT and ToT under a spectrum of expressive power:

Latency and Volume: GoT achieves latency $O(\log_k N)$ and full volume $N$ (number of solution-influencing nodes), combining fast convergence and maximal context reuse (Besta et al., 2023, Besta et al., 2024).
Adaptive Computation: Dynamic and hierarchical GoT variants concentrate computation on complex subproblems and avoid redundant work on simple paths, scaling efficiently across problem difficulty (Pandey et al., 7 Feb 2025, Ning et al., 2024).
Aggregation and Feedback: GoT supports cross-path merging, dynamic programming, and recurrent feedback, which are not possible in tree-based reasoning (Besta et al., 2024, Besta et al., 2023).
Cost Trade-offs: Benchmarking shows consistent reductions in error and inference cost across sorting, intersection, keyword counting, multimodal retrieval, and recommendation (Besta et al., 2023, Long et al., 2024).

Scheme	Latency	Volume	Aggregation	Adaptive Expansion
CoT	$N$	$N$	No	No
ToT	$\log_k N$	$O(\log_k N)$	No	No
GoT	$\log_k N$	$N$	Yes	Yes (DGoT/AGoT)

5. Empirical Benchmarks and Quantitative Results

GoT frameworks demonstrate substantial improvements across diverse benchmarks:

Sorting & Logical Reasoning: Error reductions up to 62%, cost savings up to 31% versus ToT; accuracy boosts of 89.7%, 86%, and 56% over direct prompting for complex games and algebraic problems (Besta et al., 2023, Lei et al., 2023).
Retrieval-Augmented QA: HGOT outperforms Retrieve-then-Read, Self-Ask, and DSP baselines on FEVER (+7%), Open-SQuAD (+1.6%), HotPotQA (+3.4%), with ablation studies confirming importance of hierarchical planning and citation-aware weighting (Fang et al., 2024).
Multimodal Reasoning: Stagewise GoT yields multi-point gains in VQA and image-text retrieval tasks, outperforming chain-based soft prompting baselines, with improved domain generalization (Yang et al., 2024).
Cooperative Autonomous Driving: GoT reduces 3s-waypoint average $\ell_2$ trajectory error by 3.4m and collision rate by 1.5pp, outperforming geometric and flat-LLM fusion (Chiu et al., 22 Sep 2025).
Recommendation: GOT4Rec improves HR@K and NDCG@K by 15–75% versus sequential and chain-based models (Long et al., 2024).
RL Reward Engineering: RE-GoT boosts RoboGen and ManiSkill2 task success rates by 32.25% and 93.73%, respectively, exceeding expert and prior LLM-based reward designs (Yao et al., 19 Sep 2025).
Cost-Effectiveness: DGoT achieves the best ROUGE-1 (0.358) for scientific abstract generation at only 43–56% the cost of fixed graph approaches (Ning et al., 2024). KGoT reduces AI assistant cost per task by up to 36 $\times$ vs. GPT-4 (Besta et al., 3 Apr 2025).

6. Extensibility, Open Challenges, and Future Directions

GoT is architecturally modular and extensible:

Auto-Learned Graphs: Future GoT frameworks may synthesize optimal graph blueprints via meta-prompting or reinforcement learning (Besta et al., 2024).
Integration with GNNs: Graph neural network layers facilitate more efficient evaluation and ranking over thought graphs (Yao et al., 2023).
Persistent Graph Memory: Knowledge Graph instantiations extend GoT out of the LLM context, enabling scalable, multi-agent, hybrid retrieval (Besta et al., 3 Apr 2025).
Dynamic Cost Control: DGoT and AGoT approaches dynamically bound graph expansion, trading compute for quality per-instance (Ning et al., 2024, Pandey et al., 7 Feb 2025).
Multi-Agent Coordination: Meta-level coordination among multiple LLM instances updating a shared graph opens new research in decentralized reasoning (Besta et al., 3 Apr 2025).
Theory and Scaling: Analysis of graph topology, latency, volume, and cost scaling informs the principled design of future GoT systems (Besta et al., 2024).

Challenges include prompt-length constraints, optimal scheduling, adaptive graph derivation, distributed-memory support, robust tool integration, and subgraph prediction in persistent knowledge graphs.

7. Synthesis and Impact

Graph-of-Thoughts advances the expressivity, interpretability, and robustness of structured LLM reasoning. By treating intermediate computations as nodes and their dependencies as edges within a graph, GoT frameworks support parallel, modular, and recurrent reasoning patterns, enable aggregation and reuse of sub-results, systematically reduce cost and error rates in diverse applications, and provide blueprints for combining LLMs with external tools, multimodal models, and advanced retrieval systems. The empirical and theoretical results position GoT as a foundational structure in contemporary prompting research, with continuing generalizations in hierarchical, persistent, adaptive, and multi-agent forms across natural language processing, vision, autonomous systems, knowledge engineering, and reinforcement learning (Besta et al., 2023, Besta et al., 2024, Fang et al., 2024, Yang et al., 2024, Zhang et al., 2024, Long et al., 2024, Pandey et al., 7 Feb 2025, Besta et al., 3 Apr 2025, Chiu et al., 22 Sep 2025, Yao et al., 19 Sep 2025).