Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding (2406.11665v1)

Published 17 Jun 2024 in cs.CL, cs.AI, and cs.CV

Abstract: Vision-LLMs (VLMs) can respond to queries about images in many languages. However, beyond language, culture affects how we see things. For example, individuals from Western cultures focus more on the central figure in an image while individuals from Eastern cultures attend more to scene context. In this work, we present a novel investigation that demonstrates and localizes VLMs' Western bias in image understanding. We evaluate large VLMs across subjective and objective visual tasks with culturally diverse images and annotations. We find that VLMs perform better on the Western subset than the Eastern subset of each task. Controlled experimentation tracing the source of this bias highlights the importance of a diverse language mix in text-only pre-training for building equitable VLMs, even when inference is performed in English. Moreover, while prompting in the language of a target culture can lead to reductions in bias, it is not a substitute for building AI more representative of the world's languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Amith Ananthram (8 papers)
  2. Elias Stengel-Eskin (49 papers)
  3. Carl Vondrick (93 papers)
  4. Mohit Bansal (304 papers)
  5. Kathleen McKeown (85 papers)
Citations (3)