Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

119 tokens/sec

GPT-4o

56 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

6 tokens/sec

GPT-4.1 Pro

47 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

100 210

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models (2403.12027v4)

Published 18 Mar 2024 in cs.CL, cs.AI, and cs.CV

Abstract: Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as LLMs, have revolutionized various natural language processing tasks and are increasingly being applied to chart understanding tasks. This survey paper provides a comprehensive overview of the recent developments, challenges, and future directions in chart understanding within the context of these foundation models. We review fundamental building blocks crucial for studying chart understanding tasks. Additionally, we explore various tasks and their evaluation metrics and sources of both charts and textual inputs. Various modeling strategies are then examined, encompassing both classification-based and generation-based approaches, along with tool augmentation techniques that enhance chart understanding performance. Furthermore, we discuss the state-of-the-art performance of each task and discuss how we can improve the performance. Challenges and future directions are addressed, highlighting the importance of several topics, such as domain-specific charts, lack of efforts in developing evaluation metrics, and agent-oriented settings. This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis, providing valuable insights and directions for future research in chart understanding leveraging large foundation models. The studies mentioned in this paper, along with emerging new research, will be continually updated at: https://github.com/khuangaf/Awesome-Chart-Understanding.

References (113)

Authors (8)

Kung-Hsiang Huang (22 papers)
Hou Pong Chan (36 papers)
Yi R. Fung (31 papers)
Haoyi Qiu (10 papers)
Mingyang Zhou (27 papers)
Shafiq Joty (187 papers)
Shih-Fu Chang (131 papers)
Heng Ji (266 papers)

Citations (11)

View on Semantic Scholar

Summary

Trends and Challenges in Automatic Chart Understanding with Large Foundation Models

Introduction to Chart Understanding

Automatic chart understanding plays a crucial role in extracting insights from visual data representations across various domains. As the volume and variability of data burgeon, the ability to swiftly interpret charts becomes indispensable for informed decision-making. This survey explores the recent progress in chart understanding facilitated by the advent of large foundation models, offering a nuanced perspective on the challenges, methodologies, and future directions in the field.

Datasets and Evaluation Metrics

The development and assessment of chart understanding models are underpinned by a rich collection of datasets, predominantly sourced from realms like academia, digital media, and public data repositories. These datasets, which range from synthetic to real-world charts, propel the research by providing diverse challenges for models to tackle. However, a notable gap in domain-specific datasets is observed, suggesting a potential avenue for future dataset curation to enhance model applicability across varied fields.

Evaluating chart understanding models extends beyond traditional accuracy metrics to encompass aspects such as faithfulness, coverage, and relevancy, each crucial for holistic assessment. Recent metrics like Relaxed Accuracy (RA) for chart question answering and ChartVE for evaluating the factual consistency of generated chart captions highlight the shift towards more nuanced and task-specific evaluation methods.

Modeling Strategies Explored

The landscape of modeling approaches for chart understanding has evolved from classification-based methods with fixed output vocabularies to more dynamic and generative models capable of producing long-form textual outputs. Large vision-LLMs (LVLMs) have emerged as a transformative force, demonstrating exceptional capabilities in chart understanding without the need for task-specific fine-tuning. These models blend advancements in vision and language processing, offering a unified approach to interpreting and generating insights from both structured chart data and unstructured textual queries.

Tool augmentation represents another pivotal aspect of chart understanding research, where external systems specifically designed for OCR or chart-to-table conversion enhance the model's perception capabilities. This strategy underscores a trend towards decomposing the chart understanding task into perception and reasoning aspects, allowing models to leverage their strengths more effectively.

Future Directions and Challenges

The survey identifies several key areas for future exploration, such as the need for domain-specific chart understanding datasets and the development of more robust and comprehensive evaluation metrics. Investigating the capabilities and limitations of LVLMs in multilingual contexts and agent-oriented settings presents an exciting frontier, potentially expanding the applicability of chart understanding models across languages and domains.

Conclusion

The field of automatic chart understanding stands at a pivotal juncture, with large foundation models offering promising avenues for overcoming existing challenges and unlocking new capabilities. This survey highlights the significance of continued innovation in dataset development, modeling approaches, and evaluation methodologies to advance the frontiers of chart understanding research. As we progress, the collaborative efforts of the research community will undoubtedly unveil more sophisticated and versatile models, enhancing our ability to distill insights from the rich tapestry of visual data that charts represent.

PDF Markdown

GitHub

GitHub - khuangaf/Awesome-Chart-Understanding: A curated list of the recent chart understanding work. (210 stars)

Tweets

https://twitter.com/steeve__huang/status/1770134505577492995

https://twitter.com/fly51fly/status/1771671224806252915

https://twitter.com/May_F1_/status/1913110047133229171

https://twitter.com/knishimae0531/status/1770289310543319490

https://twitter.com/YugenOk/status/1771954372638847087

https://twitter.com/YugenOk/status/1773548032505606210