Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

80 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

129 1 26

Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning (2312.10160v2)

Published 15 Dec 2023 in cs.CL

Abstract: Recent advancements in large vision-LLMs (LVLMs) have led to significant progress in generating natural language descriptions for visual content and thus enhancing various applications. One issue with these powerful models is that they sometimes produce texts that are factually inconsistent with the visual input. While there has been some effort to mitigate such inconsistencies in natural image captioning, the factuality of generated captions for structured document images, such as charts, has not received as much scrutiny, posing a potential threat to information reliability in critical applications. This work delves into the factuality aspect by introducing a comprehensive typology of factual errors in generated chart captions. A large-scale human annotation effort provides insight into the error patterns and frequencies in captions crafted by various chart captioning models, ultimately forming the foundation of a novel dataset, CHOCOLATE. Our analysis reveals that even state-of-the-art models, including GPT-4V, frequently produce captions laced with factual inaccuracies. In response to this challenge, we establish the new task of Chart Caption Factual Error Correction and introduce CHARTVE, a model for visual entailment that outperforms proprietary and open-source LVLMs in evaluating factual consistency. Furthermore, we propose C2TFEC, an interpretable two-stage framework that excels at correcting factual errors. This work inaugurates a new domain in factual error correction for chart captions, presenting a novel evaluation mechanism, and demonstrating an effective approach to ensuring the factuality of generated chart captions. The code and data as well as the continuously updated benchmark can be found at: https://khuangaf.github.io/CHOCOLATE/.

PDF HTML Abstract

Understanding Factual Errors in AI-Generated Chart Captions

Introduction to Chart Captioning Models

Chart captioning models have been increasingly proficient in generating natural language descriptions for visual content, including charts. This capability is key for data and business analysts, journalists, and others who depend on clear and accurate chart interpretations for reporting and decision-making. Despite the critical need for factual consistency in chart captions, research has yet to thoroughly address the factuality of such AI-generated text, which is essential for reliability in various applications.

Evaluating Factual Errors

To tackle the issue, a new dataset, named CHOCOLATE, focuses on identifying and typifying factual errors in chart captions. A substantial effort led to a broad categorization of errors, ranging from incorrect numeric values and mislabeled axes to entirely out-of-context information. Analysis of this dataset exhibited an alarming rate of factual errors across state-of-the-art captioning models, including task-specific models and large vision-LLMs (LVLMs), the latter also encompassing both proprietary (such as GPT-4V) and open-source solutions.

Progressing Towards Factual Correctness

Encountering these factual inaccuracies has given rise to the Chart Caption Factual Error Correction task, which hinges on producing a corrected caption that maintains high factual consistency with minimum edits to the original. A novel model called C2TF EC was proposed, which strategically improves factual accuracy through a two-step process. Initially, it translates the visual content of a chart into a structured table. Subsequently, leveraging the strong reasoning capabilities of LLMs (like GPT-4), it reviews and amends any inaccuracies based on the table data. The efficacy of C2TF EC is measured against both automatic evaluations and human assessments, where it has demonstrated superiority over other leading LVLMs.

Conclusions and Future Directions

The paper concludes with a pivotal contribution to the domain of artificial intelligence-generated content comprehensibility and accuracy. Constructing reliable content is crucial to maintaining trust in automated systems, and this investigation marks a significant stride towards enhancing the veracity of AI-generated chart captions. Future work may explore extending these factual error correction techniques to other forms of visual information and refining detection and correction algorithms for even greater accuracy.

PDF Markdown Bookmark Chat (Pro)

References (49)

Authors (8)

Kung-Hsiang Huang (22 papers)
Mingyang Zhou (27 papers)
Hou Pong Chan (36 papers)
Yi R. Fung (31 papers)
Zhenhailong Wang (17 papers)
Lingyu Zhang (21 papers)
Shih-Fu Chang (131 papers)
Heng Ji (266 papers)

Citations (27)

View on Semantic Scholar

GitHub

GitHub - khuangaf/CHOCOLATE: Code and data for the paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning" (26 stars)

Tweets

https://twitter.com/YiFung10/status/1791326432427126809

https://twitter.com/932462804056936448/status/1736936683826753685

https://twitter.com/kenchanhp/status/1822254089654128862

https://twitter.com/barrowjoseph/status/1880379335124873694

https://twitter.com/22146921/status/1737230501386334350

HackerNews

Do LVLMs Understand Charts? (1 point, 0 comments)