Pseudo-triplet Guided Few-shot Composed Image Retrieval (2407.06001v2)

Published 8 Jul 2024 in cs.CV and cs.MM

Abstract: Composed Image Retrieval (CIR) is a challenging task that aims to retrieve the target image with a multimodal query, i.e., a reference image, and its complementary modification text. As previous supervised or zero-shot learning paradigms all fail to strike a good trade-off between the model's generalization ability and retrieval performance, recent researchers have introduced the task of few-shot CIR (FS-CIR) and proposed a textual inversion-based network based on pretrained CLIP model to realize it. Despite its promising performance, the approach encounters two key limitations: simply relying on the few annotated samples for CIR model training and indiscriminately selecting training triplets for CIR model fine-tuning. To address these two limitations, we propose a novel two-stage pseudo triplet guided few-shot CIR scheme, dubbed PTG-FSCIR. In the first stage, we propose an attentive masking and captioning-based pseudo triplet generation method, to construct pseudo triplets from pure image data and use them to fulfill the CIR-task specific pertaining. In the second stage, we propose a challenging triplet-based CIR fine-tuning method, where we design a pseudo modification text-based sample challenging score estimation strategy and a robust top range-based random sampling strategy for sampling robust challenging triplets to promote the model fine-tuning. Notably, our scheme is plug-and-play and compatible with any existing supervised CIR models. We test our scheme across two backbones on three public datasets (i.e., FashionIQ, CIRR, and Birds-to-Words), achieving maximum improvements of 13.3%, 22.2%, and 17.4% respectively, demonstrating our scheme's efficacy.

Authors (6)

Bohan Hou (9 papers)
Haoqiang Lin (3 papers)
Haokun Wen (9 papers)
Meng Liu (112 papers)
Xuemeng Song (30 papers)
Mingzhu Xu (5 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Pseudo-triplet Guided Few-shot Composed Image Retrieval (2407.06001v2)

Summary

Related Papers