Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers (2403.07214v2)
Abstract: This paper, for the first time, explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR). We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos. This proficiency is underpinned by their robust cross-modal capabilities and shape bias, findings that are substantiated through our pilot studies. In order to harness pre-trained diffusion models effectively, we introduce a straightforward yet powerful strategy focused on two key aspects: selecting optimal feature layers and utilising visual and textual prompts. For the former, we identify which layers are most enriched with information and are best suited for the specific retrieval requirements (category-level or fine-grained). Then we employ visual and textual prompts to guide the model's feature extraction process, enabling it to generate more discriminative and contextually relevant cross-modal representations. Extensive experiments on several benchmark datasets validate significant performance improvements.
- Exploring Visual Prompts for Adapting Large-Scale Models. arXiv preprint arXiv:2203.17274, 2022.
- Label-Efficient Semantic Segmentation with Diffusion Models. In ICLR, 2021.
- Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval. In CVPR, 2020.
- More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval. In CVPR, 2021.
- Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches. In CVPR, 2022a.
- Sketching Without Worrying: Noise-Tolerant Sketch-Based Image Retrieval. In CVPR, 2022b.
- Adaptive Fine-Grained Sketch-Based Image Retrieval. In ECCV, 2022c.
- DemoCaricature: Democratising Caricature Generation with a Rough Sketch. In CVPR, 2024.
- Reproducible scaling laws for contrastive language-image learning. In CVPR, 2023.
- Partially Does It: Towards Scene-Level FG-SBIR with Partial Input. In CVPR, 2022.
- SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text. In CVPR, 2023.
- Sketching with Style: Visual Search with Sketches and Aesthetic Context. In ICCV, 2017.
- LiveSketch: Query Perturbations for Guided Sketch-based Visual Search. In CVPR, 2019.
- Medical diffusion on a budget: textual inversion for medical image generation. arXiv preprint arXiv:2303.13430, 2023.
- ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009.
- Doodle to Search: Practical Zero-Shot Sketch-based Image Retrieval. In CVPR, 2019.
- Diffusion Models Beat GANs on Image Synthesis. In NeurIPS, 2021.
- CogView: Mastering Text-to-Image Generation via Transformers. In NeurIPS, 2021.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR, 2021.
- Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval. In CVPR, 2019.
- How do humans sketch objects? ACM TOG, 2012.
- Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors. In ECCV, 2022.
- An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. In ICLR, 2023.
- SketchyCOCO: Image Generation from Freehand Scene Sketches. In CVPR, 2020.
- Image Style Transfer Using Convolutional Neural Networks. In CVPR, 2016.
- ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In ICLR, 2019.
- A Neural Representation of Sketch Drawings. In ICLR, 2017.
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In ICCV, 2015.
- Unsupervised Semantic Correspondence Using Stable Diffusion. arXiv preprint arXiv:2305.15581, 2023.
- Prompt-to-Prompt Image Editing with Cross Attention Control. In ICLR, 2022.
- Classifier-Free Diffusion Guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising Diffusion Probabilistic Models. In NeurIPS, 2020.
- Cascaded Diffusion Models for High Fidelity Image Generation. JMLR, 2022.
- A performance evaluation of gradient field HOG descriptor for sketch based image retrieval. CVIU, 2013.
- Variational Interaction Information Maximization for Cross-domain Disentanglement. In NeurIPS, 2020.
- Intriguing properties of generative classifiers. arXiv preprint arXiv:2309.16779, 2023.
- Visual Prompt Tuning. In ECCV, 2022.
- Imagic: Text-Based Real Image Editing with Diffusion Models. In CVPR, 2023.
- DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation. In CVPR, 2022.
- Auto-Encoding Variational Bayes. In ICLR, 2014.
- Picture that Sketch: Photorealistic Image Generation from Abstract Sketches. In CVPR, 2023.
- You’ll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval. In CVPR, 2024a.
- How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval? In CVPR, 2024b.
- It’s All About Your Sketch: Democratising Sketch Control in Diffusion Models. In CVPR, 2024c.
- Your Diffusion Model is Secretly a Zero-Shot Classifier. In ICCV, 2023a.
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In ICML, 2022.
- Photo Pre-Training, but for Sketch. In CVPR, 2023b.
- Fine-grained sketch-based image retrieval by matching deformable part models. In BMVC, 2014.
- Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style. In CVPR, 2023.
- Conditional Stroke Recovery for Fine-Grained Sketch-Based Image Retrieval. In ECCV, 2022.
- Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval. In CVPR, 2017.
- Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval. In ICCV, 2019.
- Decoupled Weight Decay Regularization. In ICLR, 2019.
- Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence. arXiv preprint arXiv:2305.14334, 2023.
- SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In ICLR, 2021.
- SKED: Sketch-guided Text-based 3D Editing. In CVPR, 2023.
- Distributed Representations of Words and Phrases and their Compositionality. In NeurIPS, 2013.
- T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models. arXiv preprint arXiv:2302.08453, 2023.
- FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context. In ECCV, 2022.
- GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In ICML, 2021.
- Cross-domain Generative Learning for Fine-Grained Sketch-Based Image Retrieval. In BMVC, 2017.
- Generalising Fine-Grained Sketch-Based Image Retrieval. In CVPR, 2019.
- Solving mixed-modal jigsaw puzzle for fine-grained sketch-based image retrieval. In CVPR, 2020.
- Making Better Use of Edges via Perceptual Grouping. In CVPR, 2015.
- Learning Transferable Visual Models From Natural Language Supervision. In ICML, 2021.
- Zero-Shot Text-to-Image Generation. In ICML, 2021.
- Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125, 2022.
- Sketchformer: Transformer-based Representation for Sketched Structure. In CVPR, 2020.
- High-Resolution Image Synthesis with Latent Diffusion Models. In CVPR, 2022.
- U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI, 2015.
- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. In CVPR, 2023.
- Jose M Saavedra. Sketch based image retrieval using a soft computation of the histogram of edge local orientations (S-HELO). In ICIP, 2014.
- Sketch based Image Retrieval using Learned KeyShapes (LKS). In BMVC, 2015.
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In NeurIPS, 2022.
- Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval. In BMVC, 2020.
- StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval. In CVPR, 2021.
- Sketch3T: Test-Time Training for Zero-Shot SBIR. In CVPR, 2022.
- CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not. In CVPR, 2023a.
- Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR. In CVPR, 2023b.
- The sketchy database: learning to retrieve badly drawn bunnies. ACM TOG, 2016a.
- The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies. ACM TOG, 2016b.
- A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch. In ECCV, 2022.
- LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs. In NeurIPSW, 2021.
- LAION-5B: An open large-scale dataset for training next generation image-text models. In NeurIPSW, 2022.
- Generalizing Across Domains via Cross-Gradient Training. In ICLR, 2018.
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In EMNLP, 2019.
- Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR, 2015.
- Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma. In BMVC, 2017a.
- Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval. In ICCV, 2017b.
- Emergent Correspondence from Image Diffusion. In NeurIPS, 2023.
- Relationship-Preserving Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval. In ACM MM, 2021.
- TVT: Three-Way Vision Transformer through Multi-Modal Hypersphere Learning for Zero-Shot Sketch-Based Image Retrieval. In AAAI, 2022.
- Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation. In CVPR, 2023.
- Transferable Coupled Network for Zero-Shot Sketch-Based Image Retrieval. IEEE TPAMI, 2021a.
- Prototype-based Selective Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval. In ACM MM, 2022.
- Sketch-Based Image Retrieval With Multi-Clustering Re-Ranking. IEEE TCSVT, 2019.
- Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. In ICCV, 2021b.
- 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features. arXiv preprint arXiv:2311.04391, 2023a.
- DLA-Net for FG-SBIR: Dynamic Local Aligned Network for Fine-Grained Sketch-Based Image Retrieval. In ACM MM, 2021.
- Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models. In CVPR, 2023b.
- SketchMate: Deep Hashing for Million-Scale Human Sketch Retrieval. In CVPR, 2018.
- Deep Plastic Surgery: Robust and Controllable Image Editing with Human-Drawn Sketches. In ECCV, 2020.
- A Zero-Shot Framework for Sketch Based Image Retrieval. In ECCV, 2018.
- Sketch Me That Shoe. In CVPR, 2016.
- Learning Structural Representations via Dynamic Object Landmarks Discovery for Sketch Recognition and Retrieval. IEEE TIP, 2019.
- Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV, 2023.
- Zero-Shot Sketch-Based Image Retrieval via Graph Convolution Network. In AAAI, 2020.
- Conditional Prompt Learning for Vision-Language Models. In CVPR, 2022a.
- Learning to Prompt for Vision-Language Models. IJCV, 2022b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.