Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 31 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 11 tok/s Pro
GPT-5 High 9 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 463 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Charts-of-Thought: Enhancing LLM Visualization Literacy Through Structured Data Extraction (2508.04842v1)

Published 6 Aug 2025 in cs.HC

Abstract: This paper evaluates the visualization literacy of modern LLMs and introduces a novel prompting technique called Charts-of-Thought. We tested three state-of-the-art LLMs (Claude-3.7-sonnet, GPT-4.5 preview, and Gemini-2.0-pro) on the Visualization Literacy Assessment Test (VLAT) using standard prompts and our structured approach. The Charts-of-Thought method guides LLMs through a systematic data extraction, verification, and analysis process before answering visualization questions. Our results show Claude-3.7-sonnet achieved a score of 50.17 using this method, far exceeding the human baseline of 28.82. This approach improved performance across all models, with score increases of 21.8% for GPT-4.5, 9.4% for Gemini-2.0, and 13.5% for Claude-3.7 compared to standard prompting. The performance gains were consistent across original and modified VLAT charts, with Claude correctly answering 100% of questions for several chart types that previously challenged LLMs. Our study reveals that modern multimodal LLMs can surpass human performance on visualization literacy tasks when given the proper analytical framework. These findings establish a new benchmark for LLM visualization literacy and demonstrate the importance of structured prompting strategies for complex visual interpretation tasks. Beyond improving LLM visualization literacy, Charts-of-Thought could also enhance the accessibility of visualizations, potentially benefiting individuals with visual impairments or lower visualization literacy.

Summary

  • The paper introduces Charts-of-Thought, a novel structured prompting method that significantly boosts LLM visualization literacy scores.
  • Its experimental analysis using a modified VLAT shows marked performance improvements across various state-of-the-art models and visualization challenges.
  • Results indicate that structured analytical guidance can elevate LLMs to outperform human baselines in complex visual data interpretation tasks.

Enhancing LLM Visualization Literacy

The paper "Charts-of-Thought: Enhancing LLM Visualization Literacy Through Structured Data Extraction" explores the visualization literacy capabilities of LLMs and introduces a novel prompting method called Charts-of-Thought. This work involved testing multiple state-of-the-art models using the Visualization Literacy Assessment Test (VLAT) and developing techniques to guide these models through structured analytic processes. The results present significant performance improvements, establishing a new benchmark for visualization tasks.

Introduction

Visualization literacy, defined as the skill to understand and interpret visual representations, is critical as data visualizations become more widespread. Existing assessments like the VLAT have revealed limitations in current LLM capabilities compared to human performance. The paper hypothesizes that these deficiencies are due not to intrinsic model limitations but rather to the lack of structured analysis guidance. Inspired by human cognitive strategies and the Chain-of-Thought prompting technique, the researchers propose the Charts-of-Thought approach. This method systematically guides LLMs through data extraction, verification, and answer formulation phases, mirroring human visual interpretation processes.

Methodology

LLMs Evaluated

Three advanced multimodal LLMs were selected due to their capabilities and availability:

  • Claude-3.7-sonnet: Latest from Anthropic, excelling in multimodal tasks.
  • GPT-4.5-preview: OpenAI's leading model, notable for its reasoning capabilities.
  • Gemini-2.0-pro: Google's latest product, praised for its versatility.

Experimental Design

The VLAT was chosen to evaluate these models through both the original and modified setups. Modifications included new data points to prevent memorization bias, ensuring the validity of assessments. The prompting was categorized as Generic Prompting and Charts-of-Thought Prompting, where the latter involved a multi-step process guiding LLMs through data analysis akin to human reasoning. The series of tasks required models to engage systematically, extracting and sorting data, verifying, and providing answers based on structured data tables.

Prompting Approaches

Two approaches were compared:

  • Generic Prompt: Simplistic instructions similar to past studies.
  • Charts-of-Thought Prompt: Elaborate step-by-step guidance for systematic data extraction and analysis intended to address previous limitations.

Experiments and Results

Modified VLAT

Overall Performance

The paper demonstrated clear improvements across all models with Charts-of-Thought prompting. Claude-3.7-sonnet achieved remarkably higher scores, surpassing human baselines, while other models also showed significant gains compared to standard prompting methods.

Performance by Question Difficulty

Figure 1

Figure 1: Modified VLAT results by question difficulty showing improvements across Easy, Moderate, and Hard questions for all three LLM models.

The structured prompting benefited harder questions substantially, with Claude-3.7-sonnet achieving near-perfect scores on previously challenging tasks.

Performance by Task Type

The analysis reveals consistent improvements in complex analytical tasks such as value retrieval and trend identification using structured prompting, contrary to earlier beliefs that these were inherent weaknesses in LLM performance.

Original VLAT Results

Testing on the original VLAT showed that Charts-of-Thought prompting led to outstanding performance improvements compared to previous generations of LLMs, including a significant leap in comparison to human baselines. Figure 2

Figure 2: Original VLAT results by chart type showing performance differences between prompting strategies across 12 visualization types.

Conclusion

The investigation underscores the potential of structured prompting in enhancing LLM capabilities far beyond their traditional limits. This advancement not only sets a new standard for LLM visualization literacy but also hints at future integration possibilities in automated visualization tasks and human-machine interactive systems. However, challenges such as complex visualizations, and color interpretation remain, suggesting avenues for further enhancement and specialized developments.

The findings advocate for the integration of structured prompts like Charts-of-Thought in diverse applications to improve accessibility and comprehensibility of visual data, pointing to significant implications across research, education, and professional data analysis sectors. Future exploration could focus on modifying this approach for specific visualization styles and expanding to other complex areas of model interpretation.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube