Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 147 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 58 tok/s Pro
Kimi K2 201 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding (2409.01577v2)

Published 3 Sep 2024 in cs.CV

Abstract: Chart understanding enables automated data analysis for humans, which requires models to achieve highly accurate visual comprehension. While existing Visual LLMs (VLMs) have shown progress in chart understanding, the lack of high-quality training data and comprehensive evaluation benchmarks hinders VLM chart comprehension. In this paper, we introduce EvoChart, a novel self-training method for generating synthetic chart data to enhance VLMs' capabilities in real-world chart comprehension. We also propose EvoChart-QA, a noval benchmark for measuring models' chart comprehension abilities in real-world scenarios. Specifically, EvoChart is a unique self-training data synthesis approach that simultaneously produces high-quality training corpus and a high-performance chart understanding model. EvoChart-QA consists of 650 distinct real-world charts collected from 140 different websites and 1,250 expert-curated questions that focus on chart understanding. Experimental results on various open-source and proprietary VLMs tested on EvoChart-QA demonstrate that even the best proprietary model, GPT-4o, achieves only 49.8% accuracy. Moreover, the EvoChart method significantly boosts the performance of open-source VLMs on real-world chart understanding tasks, achieving 54.2% accuracy on EvoChart-QA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Phi-3 technical report: A highly capable language model locally on your phone, 2024.
  2. Realcqa: Scientific chart question answering as a test-bed for first-order logic. In ICDAR, pages 14189: 66–83, 2023.
  3. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond, 2023.
  4. Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks. arXiv preprint arXiv:2312.14238, 2023.
  5. Chartreader: A unified framework for chart derendering and comprehension without heuristic rules. In ICCV, pages 22145–22156, 2023.
  6. Instructblip: Towards general-purpose vision-language models with instruction tuning. In NeurIPS, 2023.
  7. Reinforced self-training (rest) for language modeling. arXiv preprint arXiv:2308.08998, 2023.
  8. Chartllama: A multimodal LLM for chart understanding and generation. arXiv preprint arXiv:2311.16483, 2023.
  9. DVQA: understanding data visualizations via question answering. In CVPR, pages 5648–5656, 2018.
  10. Answering questions about data visualizations using efficient bimodal fusion. In WACV, pages 1487–1496, 2020.
  11. Figureqa: An annotated figure dataset for visual reasoning. In ICLR, 2018.
  12. Echarts: A declarative framework for rapid construction of web-based visualization. VI, page 2(2): 136–146, 2018.
  13. Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models. arXiv preprint arXiv:2407.07895, 2024.
  14. SPHINX: the joint mixing of weights, tasks, and visual embeddings for multi-modal large language models. arXiv preprint arXiv:2311.07575, 2023.
  15. Matcha: Enhancing visual language pretraining with math reasoning and chart derendering. In ACL, pages 12756–12770, 2023.
  16. Improved baselines with visual instruction tuning. In CVPR, pages 26296–26306, 2024.
  17. Visual instruction tuning. In NeurIPS, 2023.
  18. Unichart: A universal vision-language pretrained model for chart comprehension and reasoning. In EMNLP, pages 14662–14684, 2023.
  19. Chartqa: A benchmark for question answering about charts with visual and logical reasoning. In Findings of ACL, pages 2263–2279, 2022.
  20. Chartinstruct: Instruction tuning for chart comprehension and reasoning. arXiv preprint arXiv:2403.09028, 2024.
  21. Chartgemma: Visual instruction-tuning for chart reasoning in the wild, 2024.
  22. Chartassisstant: A universal chart multimodal language model via chart-to-table pre-training and multitask instruction tuning. arXiv preprint arXiv: 2401.02384, 2024.
  23. Plotqa: Reasoning over scientific plots. In WACV, pages 1516–1525, 2020.
  24. Gpt-4 technical report, 2024.
  25. Visual chain of thought: Bridging logical gaps with multimodal infillings, 2024.
  26. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, 2024.
  27. Bootstrapping llm-based task-oriented dialogue agents via self-talk, 2024.
  28. Cogvlm: Visual expert for pretrained language models, 2023.
  29. Charxiv: Charting gaps in realistic chart understanding in multimodal llms. arXiv preprint arXiv:2406.18521, 2024.
  30. Chartx & chartvlm: A versatile benchmark and foundation model for complicated chart reasoning. arXiv preprint arXiv:2402.12185, 2024.
  31. Interactive evolution: A neural-symbolic self-training framework for large language models. arXiv preprint arXiv:2406.11736, 2024.
  32. Symbol-llm: Towards foundational symbol-centric interface for large language models. arXiv preprint arXiv:2311.09278, 2023.
  33. Gpt3mix: Leveraging large-scale language models for text augmentation. In Findings of EMNLP, pages 2225–2239, 2021.
  34. Tinychart: Efficient chart understanding with visual token merging and program-of-thoughts learning. arXiv preprint arXiv: 2404.16635, 2024.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.