ChartCoder: Chart-to-Code MLLM

Updated 10 June 2026

ChartCoder is a multimodal system that translates chart images into executable plotting code, ensuring lossless, high-fidelity visual and semantic restoration.
It employs a code-specialized LLM backbone with a stepwise Snippet-of-Thought strategy to accurately recover chart structures and details.
Trained on the large-scale Chart2Code-160k dataset, it significantly improves executability and restoration of fine-grained chart attributes over previous methods.

ChartCoder is a dedicated multimodal LLM (MLLM) system for chart-to-code generation, targeting the lossless translation of chart images into executable plotting scripts. This approach enables precise, editable, and reproducible visualization workflows, particularly beneficial in academic, scientific, and technical contexts where the fidelity of extracted chart semantics is paramount. ChartCoder distinguishes itself from prior chart-understanding systems by advancing code executability, detail restoration, and type generalization, underpinned by a code-centric LLM backbone, large-scale diversified data, and structured stepwise reasoning.

1. Conceptual Motivation and Problem Scope

ChartCoder is designed for the chart-to-code generation task: given a bitmap chart image (e.g., bar, line, pie, violin, complex composite), the goal is to produce full plotting scripts (e.g., Python/Matplotlib) that, when executed, reconstruct the source chart with high visual and semantic fidelity. This formulation addresses the intrinsic limitations of chart captioning and OCR-based structural extraction, providing a lossless, code-based representation that supports downstream editing, reproducibility, and integration with publication pipelines (Zhao et al., 11 Jan 2025).

The problem is defined as finding a mapping

$\mathcal{M} : I \mapsto C$

where $I$ is the chart image, and $C$ is the corresponding executable plotting script capturing all chart elements (data, layout, style, annotations).

Challenges include:

Accurate structural recovery across diverse chart types (bars, pies, lines, composites)
Restoration of fine-grained visual details (tick labels, hatches, legend positioning)
High code executability, i.e., generating scripts that run without error
Scalability to real-world data diversity (styles, complexities, languages)

2. Model Architecture and Core Techniques

ChartCoder's architecture adheres to a prototypical three-stage MLLM pipeline, but crucially replaces the general-purpose LLM backbone with a code-specialized LLM:

Vision Encoder: Uses SigLIP-384, a ViT-based model, extracting visual features from images. The “Any Resolution” strategy ensures each input chart is partitioned into $384 \times 384$ non-overlapping patches, permitting arbitrary input aspect ratios while preserving spatial detail.
Vision-Language Connector: A two-layer MLP projects the vision features $V$ into the code LLM embedding space:

$E_V = f_{\rm conn}(V) \in \mathbb{R}^{n \times d}$

Code LLM Backbone: DeepSeek-Coder 6.7B (Transformer), pretrained on code corpora. The token sequence $\bigl[E_V; E_{\rm text}\bigr]$ undergoes causal self-attention, facilitating joint vision-code alignment and generation. This backbone substantially increases code executability rates compared to general LLMs, due to its domain training on code syntax and style (Zhao et al., 11 Jan 2025).
Snippet-of-Thought (SoT): Generation proceeds in a decomposed fashion, mirroring Program-of-Thought reasoning. For each chart, ChartCoder generates code in four hierarchical steps: (1) layout/type, (2) data/colors, (3) details (hatches, legends, tick formatting), and (4) final code assembly. This decomposition forces fine-grained attention to code structure and visual mapping.

The following table summarizes the main components:

Module	Approach (ChartCoder)	Technical Role
Vision Encoder	SigLIP-384; “Any Res” patching	Patch-based visual featureization
VL Connector	2-layer MLP $f_{\rm conn}$	Feature projection to LLM space
LLM Backbone	DeepSeek-Coder 6.7B (code-focused)	Causal text & vision self-attn
Generation Strategy	Snippet-of-Thought (SoT), 4-step code decay	Stepwise code reconstruction
Training Corpus	Chart2Code-160k (27 chart types)	Supervision & generalization

3. Data Foundations: Chart2Code-160k and Diversification

ChartCoder is trained on Chart2Code-160k, the first large-scale, diverse chart-to-code dataset specifically constructed for this domain (Zhao et al., 11 Jan 2025). The pipeline includes:

Prompting GPT-4V to synthesize numeric data arrays and generate code for 27 manually defined chart types (bars, lines, pies, 3D surfaces, candlesticks, etc.).
Matplotlib/Seaborn-based code generation, enforcing intrinsic data/visual mappings and style diversity.
Filtering executed code for validity, correctness, and variety; 160,000 chart–code pairs are retained, with a Shannon diversity index $H \approx 3.1$ bits across types.
50,000 samples are further decomposed into SoT steps for stepwise supervision.

This dataset is essential for enabling generalization across chart types, supporting both syntactic and visual variation. Empirically, fine-tuning smaller MLLMs on Chart2Code-160k yields substantial improvements (30–40 points) in detailed reconstruction metrics (Zhao et al., 11 Jan 2025).

4. Training Objectives, SoT Strategy, and Model Optimization

ChartCoder training follows a two-stage protocol:

Chart-to-Text Alignment: Only $f_{\rm conn}$ is trained/fine-tuned on captioning and chart structure corpora. Loss is standard token-level cross-entropy.
Chart-to-Code Instruction Tuning: All modules are unfrozen. The model is jointly instruction-tuned on Chart2Code-160k, SoT subsets, and supplementary chart QA tasks. Optimizer is AdamW with staged learning rates (e.g., $I$ 0 for SigLIP, $I$ 1 for code LLM + connector).

SoT stepwise generation has measurable effects: removing SoT reduces executability by 2.2 percentage points and low-level and high-level visual detail recovery by over 7–8 points. Step decomposition drives the model to produce more complete, style-accurate, and executable scripts, especially for nuanced visual marks (tick format, hatches, legends) (Zhao et al., 11 Jan 2025).

5. Benchmarks, Results, and Ablations

ChartCoder achieves state-of-the-art or near-best results among open-source MLLMs on several public chart-to-code benchmarks (ChartMimic, Plot2Code, ChartX). Representative metrics include:

Metric	ChartCoder (7B)	Strongest Open MLLM	GPT-4o (Reference)
Executability Rate	91.4 %	83.2 % (InternVL2)	93.2 %
Low-Level Score	77.4	~70	79.0
High-Level Score	74.0	63.4	83.5
Plot2Code PassRate	87.9 %	85.6 %	—
ChartX GPT-score	2.09	1.89 (TinyChart)	—

Further ablations highlight critical contributions:

Code LLM backbone vs. vanilla LLM: +10.8 ExecRate, +16.0 Low-Level, +10.6 High-Level.
SoT removal: –2.2 ExecRate, –7.3 Low-Level, –8.6 High-Level.
Data scale: fine-tuning Qwen2-VL-7B on Chart2Code-160k provides +16.7, +40.5, and +33.2 gains over unadapted baselines (ExecRate, Low-Level, High-Level, respectively).

Qualitative analysis shows superior pixel-level recovery: tick labels, hatch patterns, plot layout, legend location, and color order are more faithfully reproduced than in baseline models (Zhao et al., 11 Jan 2025).

6. Method Extensions, Multi-Language Generalization, and Efficiency

Recent models extend the ChartCoder paradigm by:

Modularization via Mixture-of-Experts (MoE): Chart2Code-MoLA integrates complexity-aware expert routing, with domain-specialized submodules and sparse load balancing, combined with LoRA for parameter-efficient fine-tuning—yielding further 7–17 percentage point improvements in chart-type accuracy, 18% GPU memory reduction, and 20% faster convergence on Chart2Code-160k (Wang et al., 28 Nov 2025).
Universal Chart-to-Code Generation: CharLuMA (Chart2NCode) jointly supervises Python, R, and LaTeX plotting scripts, introducing a language-conditioned low-rank subspace adapter to share chart semantics while specializing generation style per language. This approach yields up to 98% execution accuracy on Python, with marked cross-lingual gains, indicating that multi-view alignment improves both universality and fidelity (Zhang et al., 27 Apr 2026).
Self-Evaluation and Lightweight Agents: Models like OneChart inject auxiliary numerical tokens and parallel decoders, allowing for low-parameter (0.2B) chart parsing with self-consistency confidence and performance gains in chart QA applications (Chen et al., 2024).

A plausible implication is that integrated modularity, parameter-efficient adaptation, and multi-language alignment are converging themes propelling the next wave of chart-to-code systems.

7. Limitations, Deployment, and Future Perspectives

ChartCoder’s limitations and potential frontiers are explicit (Zhao et al., 11 Jan 2025):

Model Capacity: At 7B parameters, ChartCoder lags behind proprietary MLLMs (e.g., GPT-4V) on certain high-fidelity tasks; scaling to 20–70B models is anticipated to further close the gap.
Chart Complexity: Current benchmarks focus on single-axes, static Matplotlib outputs. Coverage of interactive libraries (e.g., Plotly, D3), multi-view, and semantic-rich composite figures is limited.
Input OCR/Text Detail: Axis-label transcription and fine-tuned visual text extraction remain partially unsolved; further OCR-specific pretraining is suggested.
Dynamic Chart Generation: Early work in code-based animation (e.g., OpusAnimation) indicates that extending chart-to-code to dynamic/animated outputs is feasible via staged reward learning and specialized datasets, but this is not a solved problem in ChartCoder itself (Li et al., 2 Oct 2025).
Publication-Ready Integration: While ChartCoder reconstructs chart code, end-to-end integration with academic publishing workflows (figures conforming to venue styles, deployment loops, interactive editing) is addressed downstream by harnesses such as chart-plot (Tang et al., 8 Jun 2026).

This suggests that fully automated chart publishing requires further orchestration, encompassing AI-driven style adaptation, layout-aware rendering, and structured user intervention, beyond core code generation.

In summary, ChartCoder operationalizes high-fidelity, executable chart reconstruction from images via a code-centric MLLM, scalable data curation, and rigorous stepwise generation. The architecture and methodology have become a baseline for chart-to-code research, with extensions in modularity, efficiency, multi-language applicability, and dynamic output shaping ongoing research frontiers (Zhao et al., 11 Jan 2025, Wang et al., 28 Nov 2025, Zhang et al., 27 Apr 2026, Chen et al., 2024, Li et al., 2 Oct 2025, Tang et al., 8 Jun 2026).