Papers
Topics
Authors
Recent
2000 character limit reached

Graph of Thoughts (GoT) Framework

Updated 21 November 2025
  • Graph of Thoughts (GoT) is a prompting and reasoning framework for LLMs that represents intermediate thoughts as vertices in a directed graph with explicit dependencies.
  • It supports operations such as generation, aggregation, refinement, and distillation to enhance multi-step reasoning efficiency and solution quality.
  • Empirical results show that GoT outperforms Chain-of-Thought and Tree-of-Thoughts in both accuracy and cost effectiveness on complex tasks.

Graph of Thoughts (GoT) is a prompting and reasoning framework for LLMs that generalizes and subsumes earlier paradigms such as Chain-of-Thought (CoT) and Tree-of-Thoughts (ToT). In the GoT approach, the intermediate products of LLM reasoning—so-called “thoughts”—are instantiated as vertices in a directed graph whose edges represent explicit dependencies between thoughts. By allowing arbitrary graph structures—including but not limited to paths (chains), trees, and general directed acyclic graphs—GoT enables synergistic combination of intermediate results, feedback and refinement cycles, and pruning or distillation of the reasoning space. GoT has been empirically demonstrated to improve both the quality and efficiency of LLM solutions on complex multi-step tasks, supports extensibility with new reasoning transformations, and provides a mathematical model for structured cognitive architectures in the context of LLM prompt engineering and multi-agent reasoning (Besta et al., 2023, Besta et al., 25 Jan 2024, Pandey et al., 7 Feb 2025).

1. Formal Graph-Theoretic Framework

A Graph of Thoughts is formally defined as a tuple G=(V,E,c)G = (V, E, c) where:

  • V={v1,...,vn}V = \{v_1, ..., v_n\} is the set of nodes, each node viv_i representing an LLM-generated “thought” (such as an intermediate step, partial solution, subproblem, or semantic unit).
  • EV×VE \subseteq V \times V is the set of directed edges. An edge (uv)(u \to v) indicates that thought vv was explicitly generated (or depends) on thought uu.
  • c:VCc: V \to C is an optional class or role annotation function that maps each thought to a semantic class (e.g., plan, solution, in-context example).

The dependency structure can be encoded via the adjacency matrix A{0,1}n×nA \in \{0,1\}^{n \times n} (or a real-valued/wieghted matrix for cost/confidence annotations). In practice, GoT permits both branching (a node produces multiple children) and aggregation (a node synthesizes multiple parents), enabling reasoning topologies more general than standard chains or trees.

Each update to the graph is effected by a transformation TT:

G=T(G,pθ)=(V,E)G' = T(G, p_\theta) = (V', E')

with

V=(VV+)V,E=(EE+)E,V' = (V \cup V^+) \setminus V^-, \quad E' = (E \cup E^+) \setminus E^-,

where (V+,E+)(V^+, E^+) are the sets of new nodes and edges (from generation, aggregation, or refinement), and (V,E)(V^-, E^-) are pruned subgraphs (from distillation) (Besta et al., 2023).

2. Core GoT Operations and Algorithmic Loop

GoT supports several primitive graph operations:

  • Generation: From node vv, generate kk child thoughts; add edges (vvi+)(v \to v^+_i) for each new child.
  • Aggregation: Produce a new thought by combining the content of kk predecessors; add edges (viv+)(v_i \to v^+).
  • Refinement (Feedback Loop): Enhance or revise a thought vv using its own output as input, yielding a loop (vv)(v \to v) with updated content.
  • Distillation: Prune low-scoring thoughts or entire subgraphs according to scoring and ranking functions.

A generic execution loop for GoT-based prompting operates as follows (Besta et al., 2023, Besta et al., 25 Jan 2024):

1
2
3
4
5
6
7
8
9
10
Initialize G = ({v_0}, )  # v_0 encodes the problem/input
while not TerminationCriterion(G):
    S = SelectVertices(G)                # e.g., top-h by score
    for v in S:
        for T in {Generate, Aggregate, Refine}:
            G  T(G, p_θ)
    for v in NewVertices(G):
        score[v] = E(v, G, p_θ)          # E: evaluation function
    G  Prune(G, score)                  # distillation
return BestSolution(G)

GoT can be realized with various search algorithms (BFS, DFS, A*, best-first, probabilistic walks), with graph traversal interleaved with LLM calls for node expansion and scoring (Besta et al., 25 Jan 2024).

3. Comparison with Chain-of-Thought and Tree-of-Thoughts

GoT strictly generalizes CoT and ToT:

Scheme Structure Branching Aggregation Latency Volume
CoT Path No No NN NN
ToT Tree Yes No O(logkN)O(\log_k N) O(logkN)O(\log_k N)
GoT General DAG Yes Yes O(logN)O(\log N) NN
  • CoT: A single chain v0v1...vNv_0 \to v_1 \to ... \to v_N; no aggregation or backtracking.
  • ToT: Tree (typically kk-ary); allows branching but not combining of information.
  • GoT: Arbitrary DAG (with possible loops for refinement); both branching and aggregation permitted.

GoT achieves the best tradeoff between latency (number of reasoning steps from input to final answer, minimized by leveraging parallelism and aggregation) and volume (number of unique predecessor thoughts influencing the final output, maximized via merging) (Besta et al., 2023, Besta et al., 25 Jan 2024).

4. Performance, Efficiency, and Adaptive Extensions

Empirical evaluations have shown substantial improvements of GoT over CoT and ToT in a variety of tasks:

  • Sorting: For length P=128P=128, GoT achieved error EG9E_G \approx 9 vs. ET25E_T \approx 25 for ToT, and reduced cost ($C_G \approx \$2.88vs.vs.C_T \approx \$4.20$) (Besta et al., 2023).
  • Multi-hop reasoning and QA: GoT outperforms strong CoT/ToT/cascading baselines by 10–46 p.p. in accuracy on HotpotQA, GPQA, Mini-crossword, Game of 24, and more (Pandey et al., 7 Feb 2025).
  • Scientific abstract generation: The dynamic DGoT variant reduces inference cost by 43.7–56.4% relative to static multi-round GoT, using early-stopping and thresholding to prune unnecessary expansions while preserving or improving output quality (Ning et al., 26 Mar 2024).

Adaptive Graph of Thoughts (AGoT) further generalizes the approach by recursively decomposing only those subproblems that are judged sufficiently complex, yielding dynamic DAGs tailored per-instance at test time without training cost (Pandey et al., 7 Feb 2025).

5. Representation, Serialization, and Extensions

GoT structures must be serialized for LLM consumption, either as explicit textual representations (node/edge lists, adjacency matrices) or as structured embeddings (e.g., JSON-style graphs). In practical systems, only the current frontier and relevant aggregation nodes are materialized within the LLM context to manage context window constraints (Besta et al., 25 Jan 2024).

GoT architectures have been extended and specialized beyond generic natural language reasoning:

  • Multimodal Reasoning: GoT-CQA models chart question answering by representing the solution procedure as a DAG of operator nodes (localization, numerical, logical); computations are executed as neural blocks connected according to GoT topology, enhancing interpretability and compositionality (Zhang et al., 4 Sep 2024).
  • Recommendation: GOT4Rec decomposes sequential recommendation into parallel GoT subgraphs for short-term, long-term, and collaborative-user tendencies, with explicit aggregation improving both accuracy and long-tail coverage beyond CoT/ToT baselines (Long et al., 22 Nov 2024).
  • Reinforcement Learning: RE-GoT employs graph-of-thoughts task decomposition in reward function design, enabling LLM/VLM systems to iteratively construct and refine reward graphs for complex, multi-stage RL tasks, yielding higher task success rates and scalability (Yao et al., 19 Sep 2025).
  • Hierarchical and Knowledge Graph Variants: Hierarchical GoT (HGOT) and Knowledge Graph of Thoughts (KGoT) introduce layer-structured and semantically-labeled graphs, enabling advanced retrieval augmentation, factuality mitigation, and externalized, tool-augmented reasoning (Fang et al., 14 Feb 2024, Besta et al., 3 Apr 2025).

6. Limitations and Ongoing Research

Despite its expressivity, GoT incurs increased computational and context size overheads due to multi-step expansion and aggregation, especially as graph size grows. Research on cost control includes threshold-based dynamic pruning (DGoT), selective adaptive expansion (AGoT), and hybrid LLM-GNN scheduling (Ning et al., 26 Mar 2024, Pandey et al., 7 Feb 2025, Besta et al., 25 Jan 2024).

Key open directions include:

  • Learning or inferring optimal reasoning topologies automatically via meta-prompts or LLM-based controllers.
  • Extending to hypergraphs or graphs with richer node/edge semantics for higher-order aggregations.
  • Hardware and distributed optimizations for scalable multi-agent or parallel GoT execution.
  • Integration with external knowledge bases, retrieval modules, or explicit symbolic reasoning components.
  • Robustness methods to mitigate hallucination or circular reasoning within highly interconnected graphs.

Taken together, Graph of Thoughts constitutes a principled, extensible, and increasingly practical foundation for structured, multi-hop, and multimodal LLM reasoning, enabling both automation and interpretability for complex decision-making tasks (Besta et al., 2023, Besta et al., 25 Jan 2024, Pandey et al., 7 Feb 2025).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Graph of Thoughts (GoT).