Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Doodle to Search: Practical Zero-Shot Sketch-based Image Retrieval (1904.03451v2)

Published 6 Apr 2019 in cs.CV

Abstract: In this paper, we investigate the problem of zero-shot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognizes two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended, that consists of 330,000 sketches and 204,000 photos spanning across 110 categories. Highly abstract amateur human sketches are purposefully sourced to maximize the domain gap, instead of ones included in existing datasets that can often be semi-photorealistic. We then formulate a ZS-SBIR framework to jointly model sketches and photos into a common embedding space. A novel strategy to mine the mutual information among domains is specifically engineered to alleviate the domain gap. External semantic knowledge is further embedded to aid semantic transfer. We show that, rather surprisingly, retrieval performance significantly outperforms that of state-of-the-art on existing datasets that can already be achieved using a reduced version of our model. We further demonstrate the superior performance of our full model by comparing with a number of alternatives on the newly proposed dataset. The new dataset, plus all training and testing code of our model, will be publicly released to facilitate future research

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sounak Dey (11 papers)
  2. Pau Riba (13 papers)
  3. Anjan Dutta (41 papers)
  4. Josep Llados (52 papers)
  5. Yi-Zhe Song (120 papers)
Citations (171)

Summary

  • The paper introduces a practical Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) framework and a new dataset, QuickDraw-Extended, to address retrieving photos from sketches for categories not seen during training.
  • The QuickDraw-Extended dataset contains 330,000 sketches and 204,000 photos across 110 categories, specifically designed with highly abstract amateur sketches to better represent real-world challenges compared to previous datasets.
  • The proposed ZS-SBIR framework uses a triplet network with a domain adaptation strategy (Gradient Reversal Layer) and a novel semantic loss to bridge the gap between sketches and photos in a shared embedding space, demonstrating improved retrieval accuracy.

Doodle to Search: Zero-Shot Sketch-based Image Retrieval

The paper under discussion addresses a significant challenge in the domain of sketch-based image retrieval (SBIR) - the retrieval of images based on sketches for categories not seen during training, a task known as Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR). The authors introduce innovative approaches to tackle practical constraints of ZS-SBIR by bridging the substantial domain gap between amateur sketches and photographs and by aiming towards large-scale retrieval systems. The growing reliance on touchscreen devices has led to increased interest in SBIR, which benefits from ubiquitous sketch inputs. This research extends the capability of these systems to retrieve images from categories lacking direct training data.

Key Contributions

  1. New Dataset: QuickDraw-Extended
    • The authors introduce QuickDraw-Extended, a novel dataset specifically designed to meet ZS-SBIR challenges. It incorporates 330,000 sketches and 204,000 photos across 110 categories, employing highly abstract amateur sketches sourced from the Google QuickDraw dataset. This stands in contrast to previous datasets that often contained semi-photorealistic sketches, thereby minimizing the cross-domain gap. QuickDraw-Extended aims to accurately represent the real-world challenge of interpreting sketches with diverse drawing skills for broad and varied photo retrieval.
  2. ZS-SBIR Framework
    • The proposed ZS-SBIR framework models sketches and photos into a shared embedding space to effectively manage the cross-domain gap. A triplet network serves as the baseline structure, ensuring that sketch-query-based retrieval aligns closely with image representations. Two innovative techniques complement this: a domain adaptation strategy to learn a mutual feature space using a Gradient Reversal Layer and a novel semantic loss that utilizes external semantic knowledge to aid semantic transfer.
  3. Enhanced Performance Metrics
    • The empirical results exhibit remarkable performance improvements over existing baselines on both established and the newly proposed datasets. Compared to prior methodologies, the authors demonstrate significantly higher retrieval accuracy, even with a reduced model configuration. The proposed framework, integrated with the full suite of enhancements including attention mechanisms and domain-adaptation strategies, outstrips alternatives in both retrieval precision and semantic discrimination.

Implications and Future Directions

The proposed QuickDraw-Extended dataset and the ZS-SBIR framework have pivotal implications for advancing the field of sketch-based retrieval. The methodology and findings not only advance the theoretical understanding of cross-domain representation learning but also contribute practical tools and datasets for real-world deployment. The research underscores the complexities of bridging sketch and image modalities, calling attention to the need for semantically rich and scalable datasets. The authors plan to release the dataset and model to the public, ensuring that this work facilitates further research and development in the field.

Future exploration can focus on refining the embedding and semantic understanding to further diminish the domain gap. Additionally, exploring alternative architectures that incorporate recent advancements in deep learning and cross-modal retrieval could enhance scalability and applicability. The consideration of finer-grained distinctions within retrieval tasks, leveraging unsupervised and self-supervised learning paradigms, could further bolster the robustness and accuracy of ZS-SBIR systems.