Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 62 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 14 tok/s Pro

GPT-5 High 13 tok/s Pro

GPT-4o 93 tok/s Pro

Kimi K2 213 tok/s Pro

GPT OSS 120B 458 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding (2506.08391v1)

Published 10 Jun 2025 in cs.CV

Abstract: Despite significant advancements in Vision-LLMs (VLMs), the performance of existing VLMs remains hindered by object hallucination, a critical challenge to achieving accurate visual understanding. To address this issue, we propose SECOND: Selective and Contrastive Decoding, a novel approach that enables VLMs to effectively leverage multi-scale visual information with an object-centric manner, closely aligning with human visual perception. SECOND progressively selects and integrates multi-scale visual information, facilitating a more precise interpretation of images. By contrasting these visual information iteratively, SECOND significantly reduces perceptual hallucinations and outperforms a wide range of benchmarks. Our theoretical analysis and experiments highlight the largely unexplored potential of multi-scale application in VLMs, showing that prioritizing and contrasting across scales outperforms existing methods.