Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 148 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 94 tok/s Pro
Kimi K2 214 tok/s Pro
GPT OSS 120B 429 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Cure or Poison? Embedding Instructions Visually Alters Hallucination in Vision-Language Models (2508.01678v1)

Published 3 Aug 2025 in cs.CV and cs.AI

Abstract: Vision-LLMs (VLMs) often suffer from hallucination, partly due to challenges in aligning multimodal information. We propose Prompt-in-Image, a simple method that embeds textual instructions directly into images. This removes the need for separate text inputs and forces the model to process all content through the visual channel. We evaluate this method on three popular open-source VLMs: Qwen2.5-VL, LLaVA-1.5, and InstructBLIP. The results reveal sharp differences. Prompt-in-Image improves Qwen2.5-VL's performance, increasing POPE accuracy by 4.1 percent (from 80.2 percent to 84.3 percent) and also reducing hallucination rates on MS-COCO. In contrast, LLaVA-1.5 and InstructBLIP experience a severe performance drop, with accuracy falling from around 84 percent to near-random levels. Through detailed analysis, we found that CLIP-based encoders in LLaVA and InstructBLIP exhibit excessive attention bias toward embedded text regions, disrupting visual understanding. In contrast, Qwen's vision encoder handles text-embedded images robustly. Crucially, Prompt-in-Image reduces Qwen's modality gap, enhancing cross-modal alignment by unifying information processing through a single modality.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.