Modality-Aware Representation Learning for Zero-shot Sketch-based Image Retrieval (2401.04860v1)
Abstract: Zero-shot learning offers an efficient solution for a machine learning model to treat unseen categories, avoiding exhaustive data collection. Zero-shot Sketch-based Image Retrieval (ZS-SBIR) simulates real-world scenarios where it is hard and costly to collect paired sketch-photo samples. We propose a novel framework that indirectly aligns sketches and photos by contrasting them through texts, removing the necessity of access to sketch-photo pairs. With an explicit modality encoding learned from data, our approach disentangles modality-agnostic semantics from modality-specific information, bridging the modality gap and enabling effective cross-modal content retrieval within a joint latent space. From comprehensive experiments, we verify the efficacy of the proposed model on ZS-SBIR, and it can be also applied to generalized and fine-grained settings.
- More photos are all you need: Semi-supervised learning for fine-grained sketch based image retrieval. In CVPR, 2021.
- CrossATNet - a novel cross-attention based framework for sketch-based image retrieval. Image and Vision Computing, 104:104003, 2020.
- BDA-SketRet: Bi-level domain adaptation for zero-shot sbir. Neurocomput., 514(C):245–255, dec 2022.
- LiveSketch: Query perturbations for guided sketch-based visual search. In CVPR, 2019.
- Doodle to search: Practical zero-shot sketch-based image retrieval. In CVPR, 2019.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Semantically tied paired cycle consistency for zero-shot sketch-based image retrieval. In CVPR, 2019.
- How do humans sketch objects? ACM Trans. Graph. (Proc. SIGGRAPH), 31(4):44:1–44:10, 2012.
- Semi-transductive learning for generalized zero-shot sketch-based image retrieval. In AAAI, volume 37, 2023.
- Sketch-based image retrieval using generative adversarial networks. In ACM MM, 2017.
- Sketch-based image retrieval with deep visual semantic descriptor. Pattern Recognition, 76:537–548, 2018.
- Augmented multimodality fusion for generalized zero-shot sketch-based visual retrieval. IEEE Transactions on Image Processing, 31:3657–3668, 2022.
- Semi-heterogeneous three-way joint embedding network for sketch-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 30(9):3226–3237, 2020.
- Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning. In NIPS, 2022.
- Zero-shot everything sketch-based image retrieval, and in explainable style. In CVPR, 2023.
- TC-Net for iSBIR: Triplet classification network for instance-level sketch based image retrieval. In ACM MM, 2019.
- Deep sketch hashing: Fast free-hand sketch-based image retrieval. In CVPR, 2017.
- Semantic-aware knowledge preservation for zero-shot sketch-based image retrieval. In ICCV, 2019.
- Domain-aware se network for sketch-based image retrieval with multiplicative euclidean margin softmax. In ACM MM, 2021.
- Generalising fine-grained sketch-based image retrieval. In CVPR, 2019.
- Solving mixed-modal jigsaw puzzle for fine-grained sketch-based image retrieval. In CVPR, 2020.
- Automatic differentiation in pytorch. In NIPSW, 2017.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Leo Sampaio Ferraz Ribeiro and Moacir Antonelli Ponti. Sketch-an-anchor: Sub-epoch fast model adaptation for zero-shot sketch-based image retrieval. arxiv:2303.16769, 2023.
- Clip for all things zero-shot sketch-based image retrieval, fine-grained or not. In CVPR, 2023.
- Exploiting unlabelled photos for stronger fine-grained sbir. In CVPR, 2023.
- Sketch3T: Test-time training for zero-shot sbir. In CVPR, 2022.
- The sketchy database: Learning to retrieve badly drawn bunnies. ACM Trans. Graph., 35(4), jul 2016.
- Generalizing across domains via cross-gradient training. In ICLR, 2018.
- Zero-shot sketch-image hashing. In CVPR, 2018.
- Towards understanding the modality gap in CLIP. In ICLRW, 2023.
- Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In ICCV, 2017.
- DLI-Net: Dual local interaction network for fine-grained sketch-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 32(10):7177–7189, 2022.
- TVT: Three-way vision transformer through multi-modal hypersphere learning for zero-shot sketch-based image retrieval. In AAAI, 2022.
- Relationship-preserving knowledge distillation for zero-shot sketch based image retrieval. In ACM MM, 2021.
- Transferable coupled network for zero-shot sketch-based image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):9181–9194, 2022.
- Prototype-based selective knowledge distillation for zero-shot sketch based image retrieval. In ACM MM, 2022.
- Sketch-based image retrieval with multi-clustering re-ranking. IEEE Transactions on Circuits and Systems for Video Technology, 30(12):4929–4943, 2020.
- Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In ICML, 2020.
- A zero-shot framework for sketch based image retrieval. In ECCV, 2018.
- Sketch me that shoe. In CVPR, 2016.
- SketchNet: Sketch classification with web images. In CVPR, 2016.
- Generative domain-migration hashing for sketch-to-image retrieval. In Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss, editors, ECCV, 2018.
- Ocean: A dual learning approach for generalized zero-shot sketch-based image retrieval. In IEEE International Conference on Multimedia and Expo (ICME), 2020.
- Eunyi Lyou (3 papers)
- Doyeon Lee (1 paper)
- Jooeun Kim (3 papers)
- Joonseok Lee (39 papers)