You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval (2403.07222v2)

Published 12 Mar 2024 in cs.CV

Abstract: Two primary input modalities prevail in image retrieval: sketch and text. While text is widely used for inter-category retrieval tasks, sketches have been established as the sole preferred modality for fine-grained image retrieval due to their ability to capture intricate visual details. In this paper, we question the reliance on sketches alone for fine-grained image retrieval by simultaneously exploring the fine-grained representation capabilities of both sketch and text, orchestrating a duet between the two. The end result enables precise retrievals previously unattainable, allowing users to pose ever-finer queries and incorporate attributes like colour and contextual cues from text. For this purpose, we introduce a novel compositionality framework, effectively combining sketches and text using pre-trained CLIP models, while eliminating the need for extensive fine-grained textual descriptions. Last but not least, our system extends to novel applications in composed image retrieval, domain attribute transfer, and fine-grained generation, providing solutions for various real-world scenarios.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (75)

Authors (6)

Subhadeep Koley (21 papers)
Ayan Kumar Bhunia (63 papers)
Aneeshan Sain (40 papers)
Pinaki Nath Chowdhury (37 papers)
Tao Xiang (324 papers)
Yi-Zhe Song (120 papers)

Citations (7)

View on Semantic Scholar

YouTube

Show All Videos

You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval (2403.07222v2)

Related Papers

YouTube