SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models (2403.13263v1)

Published 20 Mar 2024 in cs.CV

Abstract: Recent trends in Large Vision LLMs (LVLMs) research have been increasingly focusing on advancing beyond general image understanding towards more nuanced, object-level referential comprehension. In this paper, we present and delve into the self-consistency capability of LVLMs, a crucial aspect that reflects the models' ability to both generate informative captions for specific objects and subsequently utilize these captions to accurately re-identify the objects in a closed-loop process. This capability significantly mirrors the precision and reliability of fine-grained visual-language understanding. Our findings reveal that the self-consistency level of existing LVLMs falls short of expectations, posing limitations on their practical applicability and potential. To address this gap, we introduce a novel fine-tuning paradigm named Self-Consistency Tuning (SC-Tune). It features the synergistic learning of a cyclic describer-locator system. This paradigm is not only data-efficient but also exhibits generalizability across multiple LVLMs. Through extensive experiments, we demonstrate that SC-Tune significantly elevates performance across a spectrum of object-level vision-language benchmarks and maintains competitive or improved performance on image-level vision-language benchmarks. Both our model and code will be publicly available at https://github.com/ivattyue/SC-Tune.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (60)

Authors (9)

Tongtian Yue (13 papers)
Jie Cheng (80 papers)
Longteng Guo (31 papers)
Xingyuan Dai (14 papers)
Zijia Zhao (17 papers)
Xingjian He (25 papers)
Gang Xiong (37 papers)
Yisheng Lv (25 papers)
Jing Liu (525 papers)

Citations (6)

View on Semantic Scholar

Tweets

https://twitter.com/CSVisionPapers/status/1770880682090205607

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models (2403.13263v1)

Related Papers

Tweets