Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Visual Culture Awareness in GPT-4V: A Comprehensive Probing (2402.06015v2)

Published 8 Feb 2024 in cs.CL and cs.CV

Abstract: Pretrained large Vision-LLMs have drawn considerable interest in recent years due to their remarkable performance. Despite considerable efforts to assess these models from diverse perspectives, the extent of visual cultural awareness in the state-of-the-art GPT-4V model remains unexplored. To tackle this gap, we extensively probed GPT-4V using the MaRVL benchmark dataset, aiming to investigate its capabilities and limitations in visual understanding with a focus on cultural aspects. Specifically, we introduced three visual related tasks, i.e. caption classification, pairwise captioning, and culture tag selection, to systematically delve into fine-grained visual cultural evaluation. Experimental results indicate that GPT-4V excels at identifying cultural concepts but still exhibits weaker performance in low-resource languages, such as Tamil and Swahili. Notably, through human evaluation, GPT-4V proves to be more culturally relevant in image captioning tasks than the original MaRVL human annotations, suggesting a promising solution for future visual cultural benchmark construction.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yong Cao (33 papers)
  2. Wenyan Li (8 papers)
  3. Jiaang Li (15 papers)
  4. Yifei Yuan (37 papers)
  5. Daniel Hershcovich (50 papers)
  6. Antonia Karamolegkou (12 papers)
Citations (5)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets