Charts-of-Thought Prompting

Updated 1 December 2025

Charts-of-Thought prompting is a technique that represents reasoning as a directed acyclic graph (DAG), enabling dynamic programming-style merging and parallel subproblem analysis.
It integrates merging strategies and diverse modalities to achieve significant improvements in retrieval, VQA, and chart interpretation tasks.
The framework supports explicit node and edge encoding, dynamic backtracking, and tool integration for interpretable and efficient multi-step reasoning.

Charts-of-Thought prompting generalizes traditional linear Chain-of-Thought strategies by modeling the reasoning process as a structured chart, specifically a directed acyclic graph (DAG) of intermediate states, rather than a mere sequence. This approach enables a more faithful imitation of human analytical processes, which often entail parallel, multi-aspect sub-reasoning and dynamic aggregation of information. It spans both symbolic and multimodal domains, underpinning new frameworks for prompt engineering in LLMs and vision-LLMs (VLMs).

1. Theoretical Foundations and Motivation

Classic Chain-of-Thought (CoT) prompting encodes reasoning as a single path of intermediate steps, which limits expressivity and parallelism. In contrast, human cognition leverages simultaneous processing of multiple partial insights, integrating them before arriving at a holistic conclusion. Charts-of-Thought (CoTʹ) prompting structurally encodes this paradigm, representing reasoning as a chart—a DAG where subproblems (nodes) are generated, branched, merged, and selectively updated to account for overlapping subproblems and shared structure (Besta et al., 25 Jan 2024, Besta et al., 2023).

This extension subsumes CoT (path graph), Tree-of-Thought (ToT, out-branching tree), and Graph-of-Thought (GoT, general directed graph), but specializes the topology to a DAG to ensure acyclicity and enable dynamic programming-style aggregations and reuse of partial solutions (Besta et al., 25 Jan 2024).

The motivation for charts-of-thought prompting is particularly evident in complex multi-modal understanding, where multiple aspects—color, shape, functional context—are simultaneously relevant. In such cases, a linear chain under-utilizes latent model capabilities, whereas a chart-based topology enables richer, multi-aspect reasoning (Yang et al., 6 Apr 2024).

2. Formal Structure and Execution Pipeline

In charts-of-thought prompting, the model maintains and dynamically updates an explicit DAG of reasoning states throughout an interaction. Formally, the chart is defined as $G = (V, E)$ , with:

$V$ : nodes representing "thoughts" (partial solutions, intermediate steps, or subproblem states).
$E$ : directed edges indicating dependencies, where $(u, v)$ signals that $v$ relies on $u$ .
Merging (in-degree > 1) is essential, as it enables multiple reasoning paths to fuse their representations, exploiting shared substructure.

Operationally, the execution pipeline comprises:

Node Expansion (Branching): Expanding frontier nodes in the DAG, producing children (subproblems) in parallel.
Aggregation (Merging): When distinct paths converge to an identical subproblem, their embeddings are combined via an aggregation function $\alpha$ , such as attention, mean pooling, or concatenation plus a learned transformation.
Scheduling/Traversal: Selection of the next node(s) to expand may follow breadth-first search (BFS), depth-first (DFS), or more sophisticated learned strategies.
Backtracking & Pruning: Low quality or redundant branches can be abandoned, minimizing cost.
Tool Integration: Pre/post-processing hooks (e.g., retrieval modules or execution engines) augment LLM capabilities at each stage (Besta et al., 25 Jan 2024, Besta et al., 2023).

In multimodal soft-prompting (e.g., Aggregation-Graph-of-Thought, AGoT), each prompt step builds a weighted subgraph $G_t$ , aggregates R meta-prompt subnodes to a central embedding, incorporates visual context, and fuses the resulting prompt into the ongoing flow using learned gating (Yang et al., 6 Apr 2024).

3. Representational Methods and Variant Implementations

Chart-of-thought prompting supports multiple representational styles suited to the task and downstream models:

Token-level directives: Informal chart representation is injected into the LLM context using plain text, e.g., enumerating node labels and parent/child relationships.
Structured (e.g., JSON/Markdown) graphs: Nodes and edges are serialized in explicit structured formats for clarity and external manipulation.
Learned embeddings: Each node $v_i$ is associated with an embedding $h_i$ , with aggregation functions $h_{\text{merge}} = \alpha(\{h_p: (p \to c)\in E\})$ realizing fused representations at merge points (Besta et al., 25 Jan 2024, Yang et al., 6 Apr 2024).

In multi-modal contexts, AGoT mechanism embeds and aggregates visual and textual representations in parallel, with per-step gating (α), soft attention over subprompts (via WeightNet), and context adaptation (MetaNet), all within a differentiable pipeline with frozen CLIP encoders and tuned prompt parameters (Yang et al., 6 Apr 2024).

The GoT/Chart framework also accommodates arbitrary transformations, such as "Generate", "Aggregate", and "Refine", and explicitly maintains graph state in memory for interpretability and graph traversal (Besta et al., 2023).

4. Empirical Results and Comparative Evaluation

Charts-of-Thought-inspired methods systematically outperform both strict chain and tree-based approaches across a range of language and vision-language tasks:

Multi-modal retrieval (AGoT, ViT-B/16 backbone): R@1 of 88.7 (Flickr30k, 2% data), +1.7 abs. gain vs. CoT-PT and +5.7 vs. frozen CLIP. MSCOCO: R@1 of 58.7 (+0.8/+5.4) (Yang et al., 6 Apr 2024).
VQA (0.75% of data): AGoT achieves 31.74% accuracy, outperforming CoT-PT by +0.88 percentage points (Yang et al., 6 Apr 2024).
Cross-label generalization (average over 11 datasets): AGoT achieves H = 77.68%, with incremental gains over CoT-PT (+0.58) and larger improvements over static prompt frameworks (Yang et al., 6 Apr 2024).
Visualization literacy (VLAT): Claude-3.7-sonnet attains VLAT = 49.44 under charts-of-thought, +74.1% vs. human baseline (28.82). Improvements for GPT-4.5 and Gemini-2.0 also significant (+21.8%/+9.4%) (Das et al., 6 Aug 2025).
Quality-cost trade-offs: In symbolic tasks (e.g., sorting), GoT achieves +62% quality and –31% cost relative to ToT by enabling merging and re-use of intermediate results (Besta et al., 2023).

Token cost for a chart topology scales with the number of nodes, but merging subproblems reduces redundancy and yields 20–50% token savings versus unmerged trees (Besta et al., 25 Jan 2024).

5. Application Domains

Charts-of-Thought prompting is applicable across a spectrum of scenarios:

Vision-Language Pretraining: Multi-modal tasks involving text-image retrieval, visual question answering, classification, and domain generalization are enhanced by multi-aspect, graph-based soft-prompting (AGoT) (Yang et al., 6 Apr 2024).
Chart and Data Visualization Reasoning: LLMs guided with structured chart-of-thought prompts excel at tasks requiring extraction, verification, and step-wise analysis of chart data. For example, PromptChart defines explicit CCR-style prompts for factoid chart QA, long-form chart QA, and summarization, achieving state-of-the-art results across benchmarks (Do et al., 2023).
General Language Reasoning: Arithmetic, logical, planning, and creative composition tasks benefit from chart-of-thought prompting, leveraging dynamic programming-style reuse in reasoning (merge nodes for subset-sum, plan graphs, or narrative arcs) (Besta et al., 25 Jan 2024).
Accessibility: Structured charts-of-thought prompts can enable LLMs to generate highly accurate alt-text, structured data tables, and interpretive summaries, supporting users with low visualization literacy or visual impairments (Das et al., 6 Aug 2025).

6. Design Guidelines, Strengths, and Limitations

Designing effective chart-of-thought prompts requires:

Explicit chart representation: Node/edge listing or structured table/JSON in context.
Aggregation operators: Selecting α functions (attention, mean-pooling, learned merge) tailored to the modality and downstream model (Besta et al., 25 Jan 2024, Yang et al., 6 Apr 2024).
Dynamic demonstration: Prompt selection covering challenging subtypes; inclusion of visual features in tabular form for chart tasks (Do et al., 2023).
Scheduling: Selecting next nodes for expansion and optimal backtracking/merging.
Consistency and factuality assessment: Empirical evaluation via harmonized metrics (H, QAFactEval, VLAT) and ablations for each design choice (Das et al., 6 Aug 2025, Do et al., 2023).

Key strengths include significant accuracy gains, enhanced sample efficiency through merge/reuse of partial solutions, interpretability of intermediate states, and robust generalization in scarce data regimes. Limitations remain in areas such as explicit chart encoding overhead (token cost), error propagation through flawed subgraphs, and ongoing need for more automatic chart derivation from unstructured prompts (Besta et al., 25 Jan 2024, Das et al., 6 Aug 2025). Occasional failures include color misinterpretation in visual prompts and arithmetic slips in multi-step computations.

7. Extensions and Future Directions

Open research avenues and prospective extensions of charts-of-thought prompting include:

Chart induction: Automatic derivation of chart topology from raw input, possibly via meta-learning or weak supervision (Besta et al., 25 Jan 2024).
Implicit encoding: Compressing chart representations to minimize token footprint while preserving graph structure (Besta et al., 25 Jan 2024).
Modality generalization: Adapting chart-of-thought templates for audio, video, or hybrid data (hyperedges for cross-modal aggregation) (Yang et al., 6 Apr 2024).
Dynamic chart adaptation: Learning to grow, prune, or restructure charts on a per-instance basis, enabling greater flexibility (Yang et al., 6 Apr 2024).
Integration with external knowledge and retrieval: Tool augmentation within chart execution pipelines (Besta et al., 25 Jan 2024).
Hardware acceleration for chart traversal and aggregation: Exploiting PIM (processing-in-memory) approaches for efficient DAG updates (Besta et al., 25 Jan 2024).

Charts-of-thought prompting currently anchors structure-enhanced, interpretable, and generalizable reasoning in both symbolic and multimodal LLMs, with demonstrated empirical superiority over strictly linear or tree-based paradigms (Yang et al., 6 Apr 2024, Besta et al., 2023, Das et al., 6 Aug 2025, Do et al., 2023, Besta et al., 25 Jan 2024).