Papers
Topics
Authors
Recent
Search
2000 character limit reached

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Published 19 Feb 2024 in cs.CV | (2402.12185v6)

Abstract: Recently, many versatile Multi-modal LLMs (MLLMs) have emerged continuously. However, their capacity to query information depicted in visual charts and engage in reasoning based on the queried contents remains under-explored. In this paper, to comprehensively and rigorously benchmark the ability of the off-the-shelf MLLMs in the chart domain, we construct ChartX, a multi-modal evaluation set covering 18 chart types, 7 chart tasks, 22 disciplinary topics, and high-quality chart data. Besides, we develop ChartVLM to offer a new perspective on handling multi-modal tasks that strongly depend on interpretable patterns, such as reasoning tasks in the field of charts or geometric images. We evaluate the chart-related ability of mainstream MLLMs and our ChartVLM on the proposed ChartX evaluation set. Extensive experiments demonstrate that ChartVLM surpasses both versatile and chart-related large models, achieving results comparable to GPT-4V. We believe that our study can pave the way for further exploration in creating a more comprehensive chart evaluation set and developing more interpretable multi-modal models. Both ChartX and ChartVLM are available at: https://github.com/Alpha-Innovator/ChartVLM

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Chartcheck: An evidence-based fact-checking dataset over real-world chart images. arXiv preprint arXiv:2311.07453, 2023.
  3. Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966, 2023.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  5. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  6. Chartllama: A multimodal llm for chart understanding and generation. arXiv preprint arXiv:2311.16483, 2023.
  7. Lineex: Data extraction from scientific line charts. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6213–6221, 2023.
  8. Text2analysis: A benchmark of table question answering with advanced data analysis and unclear queries. arXiv preprint arXiv:2312.13671, 2023.
  9. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  10. Do lvlms understand charts? analyzing and correcting factual errors in chart captioning. arXiv preprint arXiv:2312.10160, 2023a.
  11. Instruct2act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176, 2023b.
  12. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. arXiv preprint arXiv:2210.12283, 2022.
  13. Dvqa: Understanding data visualizations via question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5648–5656, 2018.
  14. Chart-to-text: A large-scale benchmark for chart summarization. In Annual Meeting of the Association for Computational Linguistics, 2022.
  15. Opencqa: Open-ended question answering with charts. arXiv preprint arXiv:2210.06628, 2022.
  16. Pix2struct: Screenshot parsing as pretraining for visual language understanding. In International Conference on Machine Learning, pages 18893–18912. PMLR, 2023.
  17. Sphinx: The joint mixing of weights, tasks, and visual embeddings for multi-modal large language models. arXiv preprint arXiv:2311.07575, 2023.
  18. Deplot: One-shot visual language reasoning by plot-to-table translation. arXiv preprint arXiv:2212.10505, 2022a.
  19. Matcha: Enhancing visual language pretraining with math reasoning and chart derendering. arXiv preprint arXiv:2212.09662, 2022b.
  20. Mmc: Advancing multimodal chart understanding with large-scale instruction tuning. arXiv preprint arXiv:2311.10774, 2023a.
  21. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744, 2023b.
  22. Chartocr: Data extraction from charts images via a deep hybrid framework. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 1917–1925, 2021.
  23. Chartqa: A benchmark for question answering about charts with visual and logical reasoning. arXiv preprint arXiv:2203.10244, 2022.
  24. Chartassisstant: A universal chart multimodal language model via chart-to-table pre-training and multitask instruction tuning. arXiv preprint arXiv:2401.02384, 2024.
  25. Plotqa: Reasoning over scientific plots. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1527–1536, 2020.
  26. Chart-to-text: Generating natural language descriptions for charts by adapting the transformer model. arXiv preprint arXiv:2010.09142, 2020.
  27. OpenAI. Gpt-4v(ision) system card. https://openai.com/contributions/gpt-4v, 2023.
  28. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  29. Chartreader: Automatic parsing of bar-plots. In 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI), pages 318–325. IEEE, 2021.
  30. Analysing mathematical reasoning abilities of neural models. arXiv preprint arXiv:1904.01557, 2019.
  31. Chartgpt: Leveraging llms to generate charts from abstract natural language. arXiv preprint arXiv:2311.01920, 2023.
  32. Solving olympiad geometry without human demonstrations. Nature, 625(7995):476–482, 2024.
  33. Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079, 2023.
  34. Structchart: Perception, structuring, reasoning for visual chart understanding. arXiv preprint arXiv:2309.11268, 2023.
  35. Chartbench: A benchmark for complex visual reasoning in charts. arXiv preprint arXiv:2312.15915, 2023.
  36. Leandojo: Theorem proving with retrieval-augmented language models. arXiv preprint arXiv:2306.15626, 2023a.
  37. The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 9(1), 2023b.
  38. Tablegpt: Towards unifying tables, nature language and commands into one gpt. arXiv preprint arXiv:2307.08674, 2023.
Citations (30)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.