ChartReasoner: Code-Driven Chart QA

Updated 10 June 2026

ChartReasoner is a code-driven framework that converts chart images into lossless, executable symbolic representations for precise multimodal question answering.
It employs a two-stage pipeline by first translating charts to code and then using chain-of-thought reasoning with LLMs for stepwise logical and arithmetic operations.
The framework demonstrates competitive accuracy and reduced hallucinations on benchmarks, outperforming traditional image-to-text methods via interpretable reasoning.

ChartReasoner denotes a class of code-driven frameworks and model architectures for multimodal chart question answering (CQA) that leverage structured program representations and chain-of-thought (CoT) reasoning to achieve both high-precision interpretation and interpretable output in chart understanding tasks across a range of benchmarks. The core innovation in ChartReasoner systems is the explicit transformation of chart images into a lossless, executable code representation (such as ECharts code or equivalent symbolic forms), which then serves as a substrate for stepwise, programmatic reasoning executed or guided by LLMs. This approach rigorously preserves structural, semantic, and data fidelity across the visual → reasoning modality bridge, enabling advanced interpretability, reduced hallucination, and competitive performance with state-of-the-art open-source and proprietary multimodal LLMs (Jia et al., 11 Jun 2025).

1. Code-Driven Modality Bridging

A defining characteristic of ChartReasoner is its two-stage reasoning pipeline (Jia et al., 11 Jun 2025):

Transport Model (Chart2Code): The input chart image is transformed into precise, executable ECharts code $c$ , encapsulating axes, data arrays, geometries, color schemes, legends, and grid configurations. This code is structurally lossless, in contrast to prior approaches that rely on image-to-text conversions and suffer semantic/structural information loss.
Reasoning Model (LLM-Driven CoT): The code $c$ and a natural-language question $q$ are jointly input to a multimodal LLM, which produces an interpretable chain-of-thought $r$ , followed by a final answer $a$ . The LLM operates over the explicit symbolic code, supporting stepwise logical, arithmetic, and aggregation operations analogous to human analytical workflows.

This pipeline enables atomic operations such as data retrieval, aggregation, and conditional logic to be performed directly on the chart’s underlying symbolic state, rather than via brittle pattern-matching or OCR-derived pseudo-tables.

2. Data Synthesis and Symbolic Distillation

To support lossless chart-to-code translation and long-chain symbolic reasoning, ChartReasoner utilizes large-scale, high-quality datasets generated via:

Synthetic Template Library: Dozens of chart subtypes spanning major categories (bar, line, pie, scatter, box, area, mixed), each rendered from LLM-generated ECharts code variants.
Symbolic Distillation: For each chart, the transport model is used to obtain the code $c$ ; a high-capacity LLM (e.g., DeepSeek-R1) is prompted with $(c,q)$ to generate a multi-step CoT rationale $r$ and predicted answer $\tilde a$ . Only triplets $(c,q,r)$ where $c$ 0 exactly matches the ground-truth $c$ 1 are retained, ensuring the consistency and verifiability of the reasoning trace.
Code Validation and Filtering: Automated HSV-space filtering for image quality (brightness, saturation), removal of sparse-content/noise, and manual review for rendering validity. Code-validation strictly enforces that only exactly reconcilable examples enter the CoT fine-tuning corpus (Jia et al., 11 Jun 2025).

The result is a dataset (ChartThink) with $c$ 2k diverse, multi-step annotated examples covering both simple and complex chart types and reasoning patterns.

3. Model Architecture and Training

3.1 Transport Model (Chart2Code)

Backbone: Vision-language Transformer (Qwen2.5-VL-7B) with a frozen vision encoder and a trainable language decoder.
Training: Supervised sequence-to-sequence learning with cross-entropy loss on code tokens, using 110,000 curated image-code pairs.
Input/Output: $c$ 3 (image) $c$ 4 $c$ 5 (ECharts code), trained over 4 epochs with AdamW (Jia et al., 11 Jun 2025).

3.2 Reasoning Model

Backbone: The same Qwen2.5-VL-7B, with frozen encoder and trainable decoder.
Input: Template-crafted prompt embedding both $c$ 6 for CoT generation.
Phases:
- Supervised Fine-Tuning (SFT): Trained on ChartThink for 4 epochs, standard cross-entropy over $c$ 7 token sequence.
- Reinforcement Learning (GRPO): Post-SFT, group relative policy optimization (GRPO) is used to suppress hallucinated or over-long CoT chains and refine factuality conciseness, via group-normalized rewards combining answer accuracy, output format correctness, and chain length penalization.

4. Evaluation and Benchmark Results

ChartReasoner has been evaluated across major chart reasoning benchmarks:

Dataset	In/Out of Domain	ChartReasoner (SFT)	ChartReasoner (GRPO)	GPT-4o (Proprietary)
ChartQA	In-domain	86.76	86.93	85.70
EvoChart-QA	OOD	47.04	48.10	49.80
ChartBench	OOD	55.10	55.20	59.45
ChartQAPro	OOD, complex	37.94	39.97	37.67

Compared to open-source baselines (Qwen2.5-VL, InternVL2, Phi-3-Vision), ChartReasoner achieves 1–2 percentage points higher accuracy across primary metrics and approaches the GPT-4o backbone on difficult out-of-domain (OOD) and hypothetical query scenarios. The GRPO RL stage offers minor but consistent improvements in factuality and suppression of verbose CoT outputs. Effective test performance is highest on bar/pie chart types and corresponds with pass rate of Chart2Code in chart parsing (Jia et al., 11 Jun 2025).

5. Interpretability, Limitations, and Error Analysis

The code-driven modality bridging enables traceable, interpretable reasoning: generated CoT steps refer explicitly to elements and data arrays in the extracted chart code. Qualitative analysis demonstrates:

Correct localization of queried categories (e.g., "February" in bar-charts) via code references.
Precise aggregation using only code-derived data, rather than hallucinations or heuristic OCR-based values.
Interpretable explanations for both direct retrieval (“find max label”) and compositional operations (conditional aggregation over pie sectors) (Jia et al., 11 Jun 2025).

Limitations include:

Chart diversity: performance drops for real-world infographic and dashboard-style charts, especially with complex scatter/line plots involving overlapping points or non-standard geometry.
Scale: experiments currently use 7B-param backbones; scaling may further improve performance.
Chart2Code: parsing occasionally fails on highly cluttered, noisy, or stylized plots, limiting downstream reasoning.
RL reward models: current group reward normalization penalizes excessive length but does not include human preference or subjective clarity.

6. Distinction from Other ChartQA Architectures

ChartReasoner differs sharply from prior chart-LLMs based on:

End-to-End Vision-LLMs: Approaches like UniChart or ChartVLM (Masry et al., 2023, Xia et al., 2024) interleave image-text encoding and decoder-based QA but lack explicit code-based intermediate representations.
Sketch/Pointer/Bounding Box Feedback: Methods such as ChartSketcher (Huang et al., 25 May 2025) and ChartPoint (Xu et al., 29 Nov 2025) ground CoT outputs on visual cues (sketches, pointers, bounding boxes), aiming at grounding rather than symbolic interpretation.
Programmatic/Tool-Aided Agent Pipelines: Systems such as ChartAgent (Kaur et al., 6 Oct 2025, Wang et al., 16 Dec 2025) and VProChart (Huang et al., 2024) rely on external tool invocation, segmentation, and pythonic program solutions but operate primarily on image space and do not leverage full code-based abstraction.

ChartReasoner's core innovation is the lossless, executable code conversion as a single symbolic bridge between visual and linguistic modalities—a feature enabling superior interpretability, compositionality, and reduced hallucination.

7. Broader Context and Extensions

ChartReasoner is positioned at the intersection of code-driven data synthesis pipelines (Xu et al., 4 Nov 2025), symbolic reasoning, and RL-regularized LLM training. Future directions include:

Extension to more complex, real-world chart domains including infographics and multi-panel layouts.
Integration of geometric or learned visual parsing modules to handle densely packed or stylized elements.
Development of human-in-the-loop or curriculum RL strategies to further tune output quality, reasoning clarity, and factuality.
Application of the code-to-CoT paradigm to other visually structured modalities beyond charts, such as scientific diagrams and graphical abstracts (Jia et al., 11 Jun 2025).

Overall, ChartReasoner represents a unifying framework for interpretable, high-fidelity chart reasoning anchored in symbolic, executable representations and LLM-generated chain-of-thought solutions.