Measuring Agreeableness Bias in Multimodal Models (2408.09111v2)
Abstract: This paper examines a phenomenon in multimodal LLMs where pre-marked options in question images can significantly influence model responses. Our study employs a systematic methodology to investigate this effect: we present models with images of multiple-choice questions, which they initially answer correctly, then expose the same model to versions with pre-marked options. Our findings reveal a significant shift in the models' responses towards the pre-marked option, even when it contradicts their answers in the neutral settings. Comprehensive evaluations demonstrate that this agreeableness bias is a consistent and quantifiable behavior across various model architectures. These results show potential limitations in the reliability of these models when processing images with pre-marked options, raising important questions about their application in critical decision-making contexts where such visual cues might be present.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.