- The paper introduces a practical Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) framework and a new dataset, QuickDraw-Extended, to address retrieving photos from sketches for categories not seen during training.
- The QuickDraw-Extended dataset contains 330,000 sketches and 204,000 photos across 110 categories, specifically designed with highly abstract amateur sketches to better represent real-world challenges compared to previous datasets.
- The proposed ZS-SBIR framework uses a triplet network with a domain adaptation strategy (Gradient Reversal Layer) and a novel semantic loss to bridge the gap between sketches and photos in a shared embedding space, demonstrating improved retrieval accuracy.
Doodle to Search: Zero-Shot Sketch-based Image Retrieval
The paper under discussion addresses a significant challenge in the domain of sketch-based image retrieval (SBIR) - the retrieval of images based on sketches for categories not seen during training, a task known as Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR). The authors introduce innovative approaches to tackle practical constraints of ZS-SBIR by bridging the substantial domain gap between amateur sketches and photographs and by aiming towards large-scale retrieval systems. The growing reliance on touchscreen devices has led to increased interest in SBIR, which benefits from ubiquitous sketch inputs. This research extends the capability of these systems to retrieve images from categories lacking direct training data.
Key Contributions
- New Dataset: QuickDraw-Extended
- The authors introduce QuickDraw-Extended, a novel dataset specifically designed to meet ZS-SBIR challenges. It incorporates 330,000 sketches and 204,000 photos across 110 categories, employing highly abstract amateur sketches sourced from the Google QuickDraw dataset. This stands in contrast to previous datasets that often contained semi-photorealistic sketches, thereby minimizing the cross-domain gap. QuickDraw-Extended aims to accurately represent the real-world challenge of interpreting sketches with diverse drawing skills for broad and varied photo retrieval.
- ZS-SBIR Framework
- The proposed ZS-SBIR framework models sketches and photos into a shared embedding space to effectively manage the cross-domain gap. A triplet network serves as the baseline structure, ensuring that sketch-query-based retrieval aligns closely with image representations. Two innovative techniques complement this: a domain adaptation strategy to learn a mutual feature space using a Gradient Reversal Layer and a novel semantic loss that utilizes external semantic knowledge to aid semantic transfer.
- Enhanced Performance Metrics
- The empirical results exhibit remarkable performance improvements over existing baselines on both established and the newly proposed datasets. Compared to prior methodologies, the authors demonstrate significantly higher retrieval accuracy, even with a reduced model configuration. The proposed framework, integrated with the full suite of enhancements including attention mechanisms and domain-adaptation strategies, outstrips alternatives in both retrieval precision and semantic discrimination.
Implications and Future Directions
The proposed QuickDraw-Extended dataset and the ZS-SBIR framework have pivotal implications for advancing the field of sketch-based retrieval. The methodology and findings not only advance the theoretical understanding of cross-domain representation learning but also contribute practical tools and datasets for real-world deployment. The research underscores the complexities of bridging sketch and image modalities, calling attention to the need for semantically rich and scalable datasets. The authors plan to release the dataset and model to the public, ensuring that this work facilitates further research and development in the field.
Future exploration can focus on refining the embedding and semantic understanding to further diminish the domain gap. Additionally, exploring alternative architectures that incorporate recent advancements in deep learning and cross-modal retrieval could enhance scalability and applicability. The consideration of finer-grained distinctions within retrieval tasks, leveraging unsupervised and self-supervised learning paradigms, could further bolster the robustness and accuracy of ZS-SBIR systems.