Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness (2407.11229v2)

Published 15 Jul 2024 in cs.CL, cs.AI, cs.CV, cs.HC, and cs.LG

Abstract: Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual LLMs (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art VLMs on comprehensive datasets, developed specifically for this study, encompassing diverse question categories and chart formats. We investigate two key aspects: 1) the models' ability to handle varying levels of chart and question complexity, and 2) their robustness across different visual representations of the same underlying data. Our analysis reveals significant performance variations based on question and chart types, highlighting both strengths and weaknesses of current models. Additionally, we identify areas for improvement and propose future research directions to build more robust and reliable CQA systems. This study sheds light on the limitations of current models and paves the way for future advancements in the field.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Srija Mukhopadhyay (4 papers)
  2. Adnan Qidwai (3 papers)
  3. Aparna Garimella (19 papers)
  4. Pritika Ramu (4 papers)
  5. Vivek Gupta (74 papers)
  6. Dan Roth (222 papers)