Graph of Thoughts (GoT) Framework
- Graph of Thoughts (GoT) is a prompting and reasoning framework for LLMs that represents intermediate thoughts as vertices in a directed graph with explicit dependencies.
- It supports operations such as generation, aggregation, refinement, and distillation to enhance multi-step reasoning efficiency and solution quality.
- Empirical results show that GoT outperforms Chain-of-Thought and Tree-of-Thoughts in both accuracy and cost effectiveness on complex tasks.
Graph of Thoughts (GoT) is a prompting and reasoning framework for LLMs that generalizes and subsumes earlier paradigms such as Chain-of-Thought (CoT) and Tree-of-Thoughts (ToT). In the GoT approach, the intermediate products of LLM reasoning—so-called “thoughts”—are instantiated as vertices in a directed graph whose edges represent explicit dependencies between thoughts. By allowing arbitrary graph structures—including but not limited to paths (chains), trees, and general directed acyclic graphs—GoT enables synergistic combination of intermediate results, feedback and refinement cycles, and pruning or distillation of the reasoning space. GoT has been empirically demonstrated to improve both the quality and efficiency of LLM solutions on complex multi-step tasks, supports extensibility with new reasoning transformations, and provides a mathematical model for structured cognitive architectures in the context of LLM prompt engineering and multi-agent reasoning (Besta et al., 2023, Besta et al., 25 Jan 2024, Pandey et al., 7 Feb 2025).
1. Formal Graph-Theoretic Framework
A Graph of Thoughts is formally defined as a tuple where:
- is the set of nodes, each node representing an LLM-generated “thought” (such as an intermediate step, partial solution, subproblem, or semantic unit).
- is the set of directed edges. An edge indicates that thought was explicitly generated (or depends) on thought .
- is an optional class or role annotation function that maps each thought to a semantic class (e.g., plan, solution, in-context example).
The dependency structure can be encoded via the adjacency matrix (or a real-valued/wieghted matrix for cost/confidence annotations). In practice, GoT permits both branching (a node produces multiple children) and aggregation (a node synthesizes multiple parents), enabling reasoning topologies more general than standard chains or trees.
Each update to the graph is effected by a transformation :
with
where are the sets of new nodes and edges (from generation, aggregation, or refinement), and are pruned subgraphs (from distillation) (Besta et al., 2023).
2. Core GoT Operations and Algorithmic Loop
GoT supports several primitive graph operations:
- Generation: From node , generate child thoughts; add edges for each new child.
- Aggregation: Produce a new thought by combining the content of predecessors; add edges .
- Refinement (Feedback Loop): Enhance or revise a thought using its own output as input, yielding a loop with updated content.
- Distillation: Prune low-scoring thoughts or entire subgraphs according to scoring and ranking functions.
A generic execution loop for GoT-based prompting operates as follows (Besta et al., 2023, Besta et al., 25 Jan 2024):
1 2 3 4 5 6 7 8 9 10 |
Initialize G = ({v_0}, ∅) # v_0 encodes the problem/input
while not TerminationCriterion(G):
S = SelectVertices(G) # e.g., top-h by score
for v in S:
for T in {Generate, Aggregate, Refine}:
G ← T(G, p_θ)
for v in NewVertices(G):
score[v] = E(v, G, p_θ) # E: evaluation function
G ← Prune(G, score) # distillation
return BestSolution(G) |
GoT can be realized with various search algorithms (BFS, DFS, A*, best-first, probabilistic walks), with graph traversal interleaved with LLM calls for node expansion and scoring (Besta et al., 25 Jan 2024).
3. Comparison with Chain-of-Thought and Tree-of-Thoughts
GoT strictly generalizes CoT and ToT:
| Scheme | Structure | Branching | Aggregation | Latency | Volume |
|---|---|---|---|---|---|
| CoT | Path | No | No | ||
| ToT | Tree | Yes | No | ||
| GoT | General DAG | Yes | Yes |
- CoT: A single chain ; no aggregation or backtracking.
- ToT: Tree (typically -ary); allows branching but not combining of information.
- GoT: Arbitrary DAG (with possible loops for refinement); both branching and aggregation permitted.
GoT achieves the best tradeoff between latency (number of reasoning steps from input to final answer, minimized by leveraging parallelism and aggregation) and volume (number of unique predecessor thoughts influencing the final output, maximized via merging) (Besta et al., 2023, Besta et al., 25 Jan 2024).
4. Performance, Efficiency, and Adaptive Extensions
Empirical evaluations have shown substantial improvements of GoT over CoT and ToT in a variety of tasks:
- Sorting: For length , GoT achieved error vs. for ToT, and reduced cost ($C_G \approx \$2.88C_T \approx \$4.20$) (Besta et al., 2023).
- Multi-hop reasoning and QA: GoT outperforms strong CoT/ToT/cascading baselines by 10–46 p.p. in accuracy on HotpotQA, GPQA, Mini-crossword, Game of 24, and more (Pandey et al., 7 Feb 2025).
- Scientific abstract generation: The dynamic DGoT variant reduces inference cost by 43.7–56.4% relative to static multi-round GoT, using early-stopping and thresholding to prune unnecessary expansions while preserving or improving output quality (Ning et al., 26 Mar 2024).
Adaptive Graph of Thoughts (AGoT) further generalizes the approach by recursively decomposing only those subproblems that are judged sufficiently complex, yielding dynamic DAGs tailored per-instance at test time without training cost (Pandey et al., 7 Feb 2025).
5. Representation, Serialization, and Extensions
GoT structures must be serialized for LLM consumption, either as explicit textual representations (node/edge lists, adjacency matrices) or as structured embeddings (e.g., JSON-style graphs). In practical systems, only the current frontier and relevant aggregation nodes are materialized within the LLM context to manage context window constraints (Besta et al., 25 Jan 2024).
GoT architectures have been extended and specialized beyond generic natural language reasoning:
- Multimodal Reasoning: GoT-CQA models chart question answering by representing the solution procedure as a DAG of operator nodes (localization, numerical, logical); computations are executed as neural blocks connected according to GoT topology, enhancing interpretability and compositionality (Zhang et al., 4 Sep 2024).
- Recommendation: GOT4Rec decomposes sequential recommendation into parallel GoT subgraphs for short-term, long-term, and collaborative-user tendencies, with explicit aggregation improving both accuracy and long-tail coverage beyond CoT/ToT baselines (Long et al., 22 Nov 2024).
- Reinforcement Learning: RE-GoT employs graph-of-thoughts task decomposition in reward function design, enabling LLM/VLM systems to iteratively construct and refine reward graphs for complex, multi-stage RL tasks, yielding higher task success rates and scalability (Yao et al., 19 Sep 2025).
- Hierarchical and Knowledge Graph Variants: Hierarchical GoT (HGOT) and Knowledge Graph of Thoughts (KGoT) introduce layer-structured and semantically-labeled graphs, enabling advanced retrieval augmentation, factuality mitigation, and externalized, tool-augmented reasoning (Fang et al., 14 Feb 2024, Besta et al., 3 Apr 2025).
6. Limitations and Ongoing Research
Despite its expressivity, GoT incurs increased computational and context size overheads due to multi-step expansion and aggregation, especially as graph size grows. Research on cost control includes threshold-based dynamic pruning (DGoT), selective adaptive expansion (AGoT), and hybrid LLM-GNN scheduling (Ning et al., 26 Mar 2024, Pandey et al., 7 Feb 2025, Besta et al., 25 Jan 2024).
Key open directions include:
- Learning or inferring optimal reasoning topologies automatically via meta-prompts or LLM-based controllers.
- Extending to hypergraphs or graphs with richer node/edge semantics for higher-order aggregations.
- Hardware and distributed optimizations for scalable multi-agent or parallel GoT execution.
- Integration with external knowledge bases, retrieval modules, or explicit symbolic reasoning components.
- Robustness methods to mitigate hallucination or circular reasoning within highly interconnected graphs.
Taken together, Graph of Thoughts constitutes a principled, extensible, and increasingly practical foundation for structured, multi-hop, and multimodal LLM reasoning, enabling both automation and interpretability for complex decision-making tasks (Besta et al., 2023, Besta et al., 25 Jan 2024, Pandey et al., 7 Feb 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free