Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 56 tok/s
Gemini 2.5 Pro 38 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 182 tok/s Pro
GPT OSS 120B 420 tok/s Pro
Claude Sonnet 4.5 30 tok/s Pro
2000 character limit reached

VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning (2409.01667v2)

Published 3 Sep 2024 in cs.CV

Abstract: Charts are widely used for data visualization across various fields, including education, research, and business. Chart Question Answering (CQA) is an emerging task focused on the automatic interpretation and reasoning of data presented in charts. However, chart images are inherently difficult to interpret, and chart-related questions often involve complex logical and numerical reasoning, which hinders the performance of existing models. This paper introduces VProChart, a novel framework designed to address these challenges in CQA by integrating a lightweight Visual Perception Alignment Agent (VPAgent) and a Programmatic Solution Reasoning approach. VPAgent aligns and models chart elements based on principles of human visual perception, enhancing the understanding of chart context. The Programmatic Solution Reasoning approach leverages LLMs to transform natural language reasoning questions into structured solution programs, facilitating precise numerical and logical reasoning. Extensive experiments on benchmark datasets such as ChartQA and PlotQA demonstrate that VProChart significantly outperforms existing methods, highlighting its capability in understanding and reasoning with charts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Realcqa: Scientific chart question answering as a test-bed for first-order logic. In ICDAR, pages 14189: 66–83, 2023.
  2. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond. arXiv preprint arXiv:2308.12966, 2023.
  3. Onechart: Purify the chart structural extraction via one auxiliary token. arXiv preprint arXiv:2404.09987, 2024.
  4. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. TMLR, pages 2835–8856, 2023.
  5. Pali: A jointly-scaled multilingual language-image model. In ICLR, 2023.
  6. Chartreader: A unified framework for chart derendering and comprehension without heuristic rules. In ICCV, pages 22145–22156, 2023.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  8. Chartllama: A multimodal LLM for chart understanding and generation. arXiv preprint arXiv:2311.16483, 2023.
  9. DVQA: understanding data visualizations via question answering. In CVPR, pages 5648–5656, 2018.
  10. Answering questions about data visualizations using efficient bimodal fusion. In WACV, pages 1487–1496, 2020.
  11. Figureqa: An annotated figure dataset for visual reasoning. In ICLR, 2018.
  12. Ocr-free document understanding transformer. In ECCV, pages 13688: 498–517, 2022.
  13. Pix2struct: Screenshot parsing as pretraining for visual language understanding. In ICML, pages 202: 18893–18912, 2023.
  14. Classification-regression for chart comprehension. In ECCV, pages 13696: 469–484, 2022.
  15. Weakly-supervised 3d spatial reasoning for text-based visual question answering. TIP, pages 32: 3367–3382, 2023.
  16. Matcha: Enhancing visual language pretraining with math reasoning and chart derendering. In ACL, pages 12756–12770, 2023.
  17. Multilingual denoising pre-training for neural machine translation. TACL, pages 8: 726–742, 2020.
  18. Roberta: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  19. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 9992–10002, 2021.
  20. Unichart: A universal vision-language pretrained model for chart comprehension and reasoning. In EMNLP, pages 14662–14684, 2023.
  21. Chartqa: A benchmark for question answering about charts with visual and logical reasoning. In Findings of ACL, pages 2263–2279, 2022.
  22. Chartinstruct: Instruction tuning for chart comprehension and reasoning. arXiv preprint arXiv:2403.09028, 2024.
  23. Chartgemma: Visual instruction-tuning for chart reasoning in the wild, 2024.
  24. Chartassisstant: A universal chart multimodal language model via chart-to-table pre-training and multitask instruction tuning. arXiv preprint arXiv: 2401.02384, 2024.
  25. Plotqa: Reasoning over scientific plots. In WACV, pages 1516–1525, 2020.
  26. Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR, pages 21:140:1–140:67, 2020.
  27. Disavr: Disentangled adaptive visual reasoning network for diagram question answering. TIP, pages 32:4812–4827, 2023.
  28. Symbol-llm: Towards foundational symbol-centric interface for large language models. arXiv preprint arXiv:2311.09278, 2023.
  29. Qwen2 technical report. arXiv preprint arXiv:2407.10671, 2024.
  30. Tinychart: Efficient chart understanding with visual token merging and program-of-thoughts learning. arXiv preprint arXiv: 2404.16635, 2024.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.