Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models (2310.06627v4)

Published 10 Oct 2023 in cs.CL, cs.CV, and cs.LG

Abstract: Counterfactual reasoning, a fundamental aspect of human cognition, involves contemplating alternatives to established facts or past events, significantly enhancing our abilities in planning and decision-making. In light of the advancements in current multi-modal LLMs, we explore their effectiveness in counterfactual reasoning. To facilitate this investigation, we introduce a novel dataset, C-VQA, specifically designed to test the counterfactual reasoning capabilities of modern multi-modal LLMs. This dataset is constructed by infusing original questions with counterfactual presuppositions, spanning various types such as numerical and boolean queries. It encompasses a mix of real and synthetic data, representing a wide range of difficulty levels. Our thorough evaluations of contemporary vision-LLMs using this dataset have revealed substantial performance drops, with some models showing up to a 40% decrease, highlighting a significant gap between current models and human-like vision reasoning capabilities. We hope our dataset will serve as a vital benchmark for evaluating the counterfactual reasoning capabilities of models. Code and dataset are publicly available at https://bzhao.me/C-VQA/.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Letian Zhang (16 papers)
  2. Xiaotong Zhai (1 paper)
  3. Zhongkai Zhao (8 papers)
  4. Yongshuo Zong (11 papers)
  5. Xin Wen (64 papers)
  6. Bingchen Zhao (46 papers)