Papers
Topics
Authors
Recent
Search
2000 character limit reached

ChartReasoner: Code-Driven Chart QA

Updated 10 June 2026
  • ChartReasoner is a code-driven framework that converts chart images into lossless, executable symbolic representations for precise multimodal question answering.
  • It employs a two-stage pipeline by first translating charts to code and then using chain-of-thought reasoning with LLMs for stepwise logical and arithmetic operations.
  • The framework demonstrates competitive accuracy and reduced hallucinations on benchmarks, outperforming traditional image-to-text methods via interpretable reasoning.

ChartReasoner denotes a class of code-driven frameworks and model architectures for multimodal chart question answering (CQA) that leverage structured program representations and chain-of-thought (CoT) reasoning to achieve both high-precision interpretation and interpretable output in chart understanding tasks across a range of benchmarks. The core innovation in ChartReasoner systems is the explicit transformation of chart images into a lossless, executable code representation (such as ECharts code or equivalent symbolic forms), which then serves as a substrate for stepwise, programmatic reasoning executed or guided by LLMs. This approach rigorously preserves structural, semantic, and data fidelity across the visual → reasoning modality bridge, enabling advanced interpretability, reduced hallucination, and competitive performance with state-of-the-art open-source and proprietary multimodal LLMs (Jia et al., 11 Jun 2025).

1. Code-Driven Modality Bridging

A defining characteristic of ChartReasoner is its two-stage reasoning pipeline (Jia et al., 11 Jun 2025):

  1. Transport Model (Chart2Code): The input chart image is transformed into precise, executable ECharts code cc, encapsulating axes, data arrays, geometries, color schemes, legends, and grid configurations. This code is structurally lossless, in contrast to prior approaches that rely on image-to-text conversions and suffer semantic/structural information loss.
  2. Reasoning Model (LLM-Driven CoT): The code cc and a natural-language question qq are jointly input to a multimodal LLM, which produces an interpretable chain-of-thought rr, followed by a final answer aa. The LLM operates over the explicit symbolic code, supporting stepwise logical, arithmetic, and aggregation operations analogous to human analytical workflows.

This pipeline enables atomic operations such as data retrieval, aggregation, and conditional logic to be performed directly on the chart’s underlying symbolic state, rather than via brittle pattern-matching or OCR-derived pseudo-tables.

2. Data Synthesis and Symbolic Distillation

To support lossless chart-to-code translation and long-chain symbolic reasoning, ChartReasoner utilizes large-scale, high-quality datasets generated via:

  • Synthetic Template Library: Dozens of chart subtypes spanning major categories (bar, line, pie, scatter, box, area, mixed), each rendered from LLM-generated ECharts code variants.
  • Symbolic Distillation: For each chart, the transport model is used to obtain the code cc; a high-capacity LLM (e.g., DeepSeek-R1) is prompted with (c,q)(c,q) to generate a multi-step CoT rationale rr and predicted answer a~\tilde a. Only triplets (c,q,r)(c,q,r) where cc0 exactly matches the ground-truth cc1 are retained, ensuring the consistency and verifiability of the reasoning trace.
  • Code Validation and Filtering: Automated HSV-space filtering for image quality (brightness, saturation), removal of sparse-content/noise, and manual review for rendering validity. Code-validation strictly enforces that only exactly reconcilable examples enter the CoT fine-tuning corpus (Jia et al., 11 Jun 2025).

The result is a dataset (ChartThink) with cc2k diverse, multi-step annotated examples covering both simple and complex chart types and reasoning patterns.

3. Model Architecture and Training

3.1 Transport Model (Chart2Code)

  • Backbone: Vision-language Transformer (Qwen2.5-VL-7B) with a frozen vision encoder and a trainable language decoder.
  • Training: Supervised sequence-to-sequence learning with cross-entropy loss on code tokens, using 110,000 curated image-code pairs.
  • Input/Output: cc3 (image) cc4 cc5 (ECharts code), trained over 4 epochs with AdamW (Jia et al., 11 Jun 2025).

3.2 Reasoning Model

  • Backbone: The same Qwen2.5-VL-7B, with frozen encoder and trainable decoder.
  • Input: Template-crafted prompt embedding both cc6 for CoT generation.
  • Phases:

4. Evaluation and Benchmark Results

ChartReasoner has been evaluated across major chart reasoning benchmarks:

Dataset In/Out of Domain ChartReasoner (SFT) ChartReasoner (GRPO) GPT-4o (Proprietary)
ChartQA In-domain 86.76 86.93 85.70
EvoChart-QA OOD 47.04 48.10 49.80
ChartBench OOD 55.10 55.20 59.45
ChartQAPro OOD, complex 37.94 39.97 37.67

Compared to open-source baselines (Qwen2.5-VL, InternVL2, Phi-3-Vision), ChartReasoner achieves 1–2 percentage points higher accuracy across primary metrics and approaches the GPT-4o backbone on difficult out-of-domain (OOD) and hypothetical query scenarios. The GRPO RL stage offers minor but consistent improvements in factuality and suppression of verbose CoT outputs. Effective test performance is highest on bar/pie chart types and corresponds with pass rate of Chart2Code in chart parsing (Jia et al., 11 Jun 2025).

5. Interpretability, Limitations, and Error Analysis

The code-driven modality bridging enables traceable, interpretable reasoning: generated CoT steps refer explicitly to elements and data arrays in the extracted chart code. Qualitative analysis demonstrates:

  • Correct localization of queried categories (e.g., "February" in bar-charts) via code references.
  • Precise aggregation using only code-derived data, rather than hallucinations or heuristic OCR-based values.
  • Interpretable explanations for both direct retrieval (“find max label”) and compositional operations (conditional aggregation over pie sectors) (Jia et al., 11 Jun 2025).

Limitations include:

  • Chart diversity: performance drops for real-world infographic and dashboard-style charts, especially with complex scatter/line plots involving overlapping points or non-standard geometry.
  • Scale: experiments currently use 7B-param backbones; scaling may further improve performance.
  • Chart2Code: parsing occasionally fails on highly cluttered, noisy, or stylized plots, limiting downstream reasoning.
  • RL reward models: current group reward normalization penalizes excessive length but does not include human preference or subjective clarity.

6. Distinction from Other ChartQA Architectures

ChartReasoner differs sharply from prior chart-LLMs based on:

ChartReasoner's core innovation is the lossless, executable code conversion as a single symbolic bridge between visual and linguistic modalities—a feature enabling superior interpretability, compositionality, and reduced hallucination.

7. Broader Context and Extensions

ChartReasoner is positioned at the intersection of code-driven data synthesis pipelines (Xu et al., 4 Nov 2025), symbolic reasoning, and RL-regularized LLM training. Future directions include:

  • Extension to more complex, real-world chart domains including infographics and multi-panel layouts.
  • Integration of geometric or learned visual parsing modules to handle densely packed or stylized elements.
  • Development of human-in-the-loop or curriculum RL strategies to further tune output quality, reasoning clarity, and factuality.
  • Application of the code-to-CoT paradigm to other visually structured modalities beyond charts, such as scientific diagrams and graphical abstracts (Jia et al., 11 Jun 2025).

Overall, ChartReasoner represents a unifying framework for interpretable, high-fidelity chart reasoning anchored in symbolic, executable representations and LLM-generated chain-of-thought solutions.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ChartReasoner.