A Zero-Shot Framework for Sketch-based Image Retrieval (1807.11724v1)

Published 31 Jul 2018 in cs.CV

Abstract: Sketch-based image retrieval (SBIR) is the task of retrieving images from a natural image database that correspond to a given hand-drawn sketch. Ideally, an SBIR model should learn to associate components in the sketch (say, feet, tail, etc.) with the corresponding components in the image having similar shape characteristics. However, current evaluation methods simply focus only on coarse-grained evaluation where the focus is on retrieving images which belong to the same class as the sketch but not necessarily having the same shape characteristics as in the sketch. As a result, existing methods simply learn to associate sketches with classes seen during training and hence fail to generalize to unseen classes. In this paper, we propose a new benchmark for zero-shot SBIR where the model is evaluated in novel classes that are not seen during training. We show through extensive experiments that existing models for SBIR that are trained in a discriminative setting learn only class specific mappings and fail to generalize to the proposed zero-shot setting. To circumvent this, we propose a generative approach for the SBIR task by proposing deep conditional generative models that take the sketch as an input and fill the missing information stochastically. Experiments on this new benchmark created from the "Sketchy" dataset, which is a large-scale database of sketch-photo pairs demonstrate that the performance of these generative models is significantly better than several state-of-the-art approaches in the proposed zero-shot framework of the coarse-grained SBIR task.

Authors (4)

Sasi Kiran Yelamarthi (1 paper)
Shiva Krishna Reddy (1 paper)
Ashish Mishra (27 papers)
Anurag Mittal (24 papers)

Citations (177)

View on Semantic Scholar

Summary

A Zero-Shot Framework for Sketch-Based Image Retrieval

This paper addresses a critical challenge in the field of Sketch-Based Image Retrieval (SBIR): the lack of effective evaluation methods that ensure robust performance across unseen classes. Traditionally, SBIR methods focus on coarse-grained evaluation, where the retrieval of images aligns with the semantic class of a query sketch. However, these approaches fail to evaluate the model’s ability to generalize beyond specific classes seen during training. The paper introduces a zero-shot framework for SBIR, offering a novel paradigm that emphasizes generalization across novel classes, unseen during training.

Principled Insights

Critique of Current SBIR Approaches: The authors critically assess common SBIR methodologies that often falter in generalizing to unseen classes due to their reliance on training data associated with specific classes. Such discriminative models are predominantly class-specific in their learning approach, which limits their effectiveness in a zero-shot setting.
Proposed Zero-Shot Benchmark: In a noteworthy development, the paper establishes a zero-shot SBIR benchmark leveraging the "Sketchy" dataset. This dataset splits into train and test sets ensuring no overlap between the classes in each subset. The paper’s methodology discounts any data bias that might arise from an overlap with classes present in external sources like ImageNet.
Generative Model Approach: To tackle the zero-shot generalization challenge, the authors present a generative approach. They propose conditional variational autoencoders (CVAE) and adversarial autoencoders (CAAE) to generate enhanced alignment of latent features between sketches and images. These models aim to generate missing information typical in sketches that lack detailed attributes compared to photographs.

Key Results and Contributions

Through extensive empirical evaluation, the generative models outperform several state-of-the-art SBIR approaches. Specifically, CVAE demonstrates superior retrieval performance in the zero-shot setting compared to both existing SBIR models and the adopted methods from the zero-shot image classification domain, such as Semantic Autoencoders (SAE) and Embarrassingly Simple Zero-Shot Learning (ESZSL). The conditional generative models provide the capacity to learn latent alignments and overcome the limitations inherent in traditional SBIR approaches.

Implications and Future Directions

The implications of this research are substantial, both in advancing SBIR applications and in broadening the framework for zero-shot learning. In practical deployment scenarios, such as e-commerce and online search applications, SBIR systems benefit significantly from enhanced generalization capabilities. The zero-shot approach suggests a scalable path forward for SBIR systems to adapt dynamically to new and unforeseen classes, effectively capturing the evolving content landscape on the Internet.

Future research may focus on refining generative techniques to further close the domain gap between sketch data and rich image datasets. Moreover, integrating these models within interactive systems could facilitate real-time, adaptive retrieval capabilities, paving the way for innovative applications in content-based image analysis.

In conclusion, the paper makes a pivotal contribution by advancing a zero-shot SBIR framework, challenging traditional paradigms, and offering insightful methods that leverage generative models to improve retrieval efficacy across unseen classes.

PDF Markdown