Graph of Thoughts (GoT) Framework

Updated 21 November 2025

Graph of Thoughts (GoT) is a prompting and reasoning framework for LLMs that represents intermediate thoughts as vertices in a directed graph with explicit dependencies.
It supports operations such as generation, aggregation, refinement, and distillation to enhance multi-step reasoning efficiency and solution quality.
Empirical results show that GoT outperforms Chain-of-Thought and Tree-of-Thoughts in both accuracy and cost effectiveness on complex tasks.

Graph of Thoughts (GoT) is a prompting and reasoning framework for LLMs that generalizes and subsumes earlier paradigms such as Chain-of-Thought (CoT) and Tree-of-Thoughts (ToT). In the GoT approach, the intermediate products of LLM reasoning—so-called “thoughts”—are instantiated as vertices in a directed graph whose edges represent explicit dependencies between thoughts. By allowing arbitrary graph structures—including but not limited to paths (chains), trees, and general directed acyclic graphs—GoT enables synergistic combination of intermediate results, feedback and refinement cycles, and pruning or distillation of the reasoning space. GoT has been empirically demonstrated to improve both the quality and efficiency of LLM solutions on complex multi-step tasks, supports extensibility with new reasoning transformations, and provides a mathematical model for structured cognitive architectures in the context of LLM prompt engineering and multi-agent reasoning (Besta et al., 2023, Besta et al., 2024, Pandey et al., 7 Feb 2025).

1. Formal Graph-Theoretic Framework

A Graph of Thoughts is formally defined as a tuple $G = (V, E, c)$ where:

$V = \{v_1, ..., v_n\}$ is the set of nodes, each node $v_i$ representing an LLM-generated “thought” (such as an intermediate step, partial solution, subproblem, or semantic unit).
$E \subseteq V \times V$ is the set of directed edges. An edge $(u \to v)$ indicates that thought $v$ was explicitly generated (or depends) on thought $u$ .
$c: V \to C$ is an optional class or role annotation function that maps each thought to a semantic class (e.g., plan, solution, in-context example).

The dependency structure can be encoded via the adjacency matrix $A \in \{0,1\}^{n \times n}$ (or a real-valued/wieghted matrix for cost/confidence annotations). In practice, GoT permits both branching (a node produces multiple children) and aggregation (a node synthesizes multiple parents), enabling reasoning topologies more general than standard chains or trees.

Each update to the graph is effected by a transformation $T$ :

$G' = T(G, p_\theta) = (V', E')$

with

$V' = (V \cup V^+) \setminus V^-, \quad E' = (E \cup E^+) \setminus E^-,$

where $(V^+, E^+)$ are the sets of new nodes and edges (from generation, aggregation, or refinement), and $(V^-, E^-)$ are pruned subgraphs (from distillation) (Besta et al., 2023).

2. Core GoT Operations and Algorithmic Loop

GoT supports several primitive graph operations:

Generation: From node $v$ , generate $k$ child thoughts; add edges $(v \to v^+_i)$ for each new child.
Aggregation: Produce a new thought by combining the content of $k$ predecessors; add edges $(v_i \to v^+)$ .
Refinement (Feedback Loop): Enhance or revise a thought $v$ using its own output as input, yielding a loop $(v \to v)$ with updated content.
Distillation: Prune low-scoring thoughts or entire subgraphs according to scoring and ranking functions.

A generic execution loop for GoT-based prompting operates as follows (Besta et al., 2023, Besta et al., 2024):

Initialize G = ({v_0}, ∅)  # v_0 encodes the problem/input
while not TerminationCriterion(G):
    S = SelectVertices(G)                # e.g., top-h by score
    for v in S:
        for T in {Generate, Aggregate, Refine}:
            G ← T(G, p_θ)
    for v in NewVertices(G):
        score[v] = E(v, G, p_θ)          # E: evaluation function
    G ← Prune(G, score)                  # distillation
return BestSolution(G)

GoT can be realized with various search algorithms (BFS, DFS, A*, best-first, probabilistic walks), with graph traversal interleaved with LLM calls for node expansion and scoring (Besta et al., 2024).

3. Comparison with Chain-of-Thought and Tree-of-Thoughts

GoT strictly generalizes CoT and ToT:

Scheme	Structure	Branching	Aggregation	Latency	Volume
CoT	Path	No	No	$N$	$N$
ToT	Tree	Yes	No	$O(\log_k N)$	$O(\log_k N)$
GoT	General DAG	Yes	Yes	$O(\log N)$	$N$

CoT: A single chain $v_0 \to v_1 \to ... \to v_N$ ; no aggregation or backtracking.
ToT: Tree (typically $k$ -ary); allows branching but not combining of information.
GoT: Arbitrary DAG (with possible loops for refinement); both branching and aggregation permitted.

GoT achieves the best tradeoff between latency (number of reasoning steps from input to final answer, minimized by leveraging parallelism and aggregation) and volume (number of unique predecessor thoughts influencing the final output, maximized via merging) (Besta et al., 2023, Besta et al., 2024).

4. Performance, Efficiency, and Adaptive Extensions

Empirical evaluations have shown substantial improvements of GoT over CoT and ToT in a variety of tasks:

Sorting: For length $P=128$ , GoT achieved error $E_G \approx 9$ vs. $E_T \approx 25$ for ToT, and reduced cost ($C_G \approx \$2.88 $vs.$ C_T \approx \$4.20$) (Besta et al., 2023).
Multi-hop reasoning and QA: GoT outperforms strong CoT/ToT/cascading baselines by 10–46 p.p. in accuracy on HotpotQA, GPQA, Mini-crossword, Game of 24, and more (Pandey et al., 7 Feb 2025).
Scientific abstract generation: The dynamic DGoT variant reduces inference cost by 43.7–56.4% relative to static multi-round GoT, using early-stopping and thresholding to prune unnecessary expansions while preserving or improving output quality (Ning et al., 2024).

Adaptive Graph of Thoughts (AGoT) further generalizes the approach by recursively decomposing only those subproblems that are judged sufficiently complex, yielding dynamic DAGs tailored per-instance at test time without training cost (Pandey et al., 7 Feb 2025).

5. Representation, Serialization, and Extensions

GoT structures must be serialized for LLM consumption, either as explicit textual representations (node/edge lists, adjacency matrices) or as structured embeddings (e.g., JSON-style graphs). In practical systems, only the current frontier and relevant aggregation nodes are materialized within the LLM context to manage context window constraints (Besta et al., 2024).

GoT architectures have been extended and specialized beyond generic natural language reasoning:

Multimodal Reasoning: GoT-CQA models chart question answering by representing the solution procedure as a DAG of operator nodes (localization, numerical, logical); computations are executed as neural blocks connected according to GoT topology, enhancing interpretability and compositionality (Zhang et al., 2024).
Recommendation: GOT4Rec decomposes sequential recommendation into parallel GoT subgraphs for short-term, long-term, and collaborative-user tendencies, with explicit aggregation improving both accuracy and long-tail coverage beyond CoT/ToT baselines (Long et al., 2024).
Reinforcement Learning: RE-GoT employs graph-of-thoughts task decomposition in reward function design, enabling LLM/VLM systems to iteratively construct and refine reward graphs for complex, multi-stage RL tasks, yielding higher task success rates and scalability (Yao et al., 19 Sep 2025).
Hierarchical and Knowledge Graph Variants: Hierarchical GoT (HGOT) and Knowledge Graph of Thoughts (KGoT) introduce layer-structured and semantically-labeled graphs, enabling advanced retrieval augmentation, factuality mitigation, and externalized, tool-augmented reasoning (Fang et al., 2024, Besta et al., 3 Apr 2025).

6. Limitations and Ongoing Research

Despite its expressivity, GoT incurs increased computational and context size overheads due to multi-step expansion and aggregation, especially as graph size grows. Research on cost control includes threshold-based dynamic pruning (DGoT), selective adaptive expansion (AGoT), and hybrid LLM-GNN scheduling (Ning et al., 2024, Pandey et al., 7 Feb 2025, Besta et al., 2024).

Key open directions include:

Learning or inferring optimal reasoning topologies automatically via meta-prompts or LLM-based controllers.
Extending to hypergraphs or graphs with richer node/edge semantics for higher-order aggregations.
Hardware and distributed optimizations for scalable multi-agent or parallel GoT execution.
Integration with external knowledge bases, retrieval modules, or explicit symbolic reasoning components.
Robustness methods to mitigate hallucination or circular reasoning within highly interconnected graphs.

Taken together, Graph of Thoughts constitutes a principled, extensible, and increasingly practical foundation for structured, multi-hop, and multimodal LLM reasoning, enabling both automation and interpretability for complex decision-making tasks (Besta et al., 2023, Besta et al., 2024, Pandey et al., 7 Feb 2025).