A Decade Survey of Content Based Image Retrieval using Deep Learning (2012.00641v2)

Published 23 Nov 2020 in cs.CV, cs.AI, and cs.MM

Abstract: The content based image retrieval aims to find the similar images from a large scale dataset against a query image. Generally, the similarity between the representative features of the query image and dataset images is used to rank the images for retrieval. In early days, various hand designed feature descriptors have been investigated based on the visual cues such as color, texture, shape, etc. that represent the images. However, the deep learning has emerged as a dominating alternative of hand-designed feature engineering from a decade. It learns the features automatically from the data. This paper presents a comprehensive survey of deep learning based developments in the past decade for content based image retrieval. The categorization of existing state-of-the-art methods from different perspectives is also performed for greater understanding of the progress. The taxonomy used in this survey covers different supervision, different networks, different descriptor type and different retrieval type. A performance analysis is also performed using the state-of-the-art methods. The insights are also presented for the benefit of the researchers to observe the progress and to make the best choices. The survey presented in this paper will help in further research progress in image retrieval using deep learning.

Citations (187)

View on Semantic Scholar

Summary

The paper surveys deep learning advances in Content-Based Image Retrieval (CBIR) over the last decade, classifying methods by supervision, architecture, descriptors, and retrieval techniques.
The survey compares deep learning CBIR model performance using metrics like mAP on datasets such as CIFAR-10 and ImageNet, demonstrating their effectiveness compared to traditional methods.
Deep learning in CBIR has practical implications for areas like medical imaging, with future directions exploring enhanced generalization, interpretability, speed, and self-supervised learning approaches.

Comprehensive Survey on Content-Based Image Retrieval Using Deep Learning

The survey paper "A Decade Survey of Content Based Image Retrieval using Deep Learning" offers an extensive overview of advancements in Content-Based Image Retrieval (CBIR) that utilize deep learning methodologies within the past decade. The research explores the evolution and efficacy of deep learning models as alternatives to traditional hand-crafted feature descriptors in CBIR systems. It systematically categorizes these advancements into frameworks based on supervision type, network architectures, descriptor types, and various retrieval methods.

Evolution and Categorization of Approaches

Over the years, there has been a discernible shift in CBIR methodologies. Earlier efforts largely relied on hand-engineered features, leveraging visual cues like color, texture, and shape. However, these traditional methods faced limitations in accurately representing image characteristics. The advent of deep learning brought transformative changes by enabling automatic feature learning directly from data. Specifically, Convolutional Neural Networks (CNNs) emerged as the backbone for learning robust, hierarchical feature representations.

The paper outlines a structured taxonomy of state-of-the-art methods focusing on:

Supervision Type: This involves supervised, unsupervised, semi-supervised, weakly-supervised, pseudo-supervised, and self-supervised learning methods. Supervised methods have generally demonstrated superior performance due to leveraging labeled data for training, while unsupervised methods rely on engineered constraints to learn from unlabeled data.
Network Architectures: Deep learning frameworks including CNNs, autoencoders, siamese networks, triplet networks, generative adversarial networks (GANs), and recurrent neural networks (RNNs) have been employed to varying success in CBIR.
Descriptor Types: Methods are categorized into those producing binary descriptors for fast retrieval, and those generating real-valued descriptors which prioritize retrieval precision.
Retrieval Methods: Cross-modal retrieval, sketch-based image retrieval, multi-label retrieval, instance retrieval, and semantic retrieval are among the explored avenues, each presenting unique challenges and requiring tailored solutions.

Performance and Results

The survey provides a meticulous performance comparison across numerous deep learning-based retrieval approaches. Quantitative analyses are presented using metrics like Mean Average Precision (mAP) on datasets such as CIFAR-10, ImageNet, MNIST, and NUS-WIDE. Notable methods such as HashNet, Deep Index-Compatible Hashing, and Deep Transfer Hashing demonstrate competitive results across various datasets, highlighting the efficacy of deep learning models for CBIR.

Practical Implications and Future Directions

The deep learning-driven progression in CBIR systems holds substantial implications for sectors requiring rapid and accurate image retrieval, such as medical imaging, automated surveillance, and multimedia content management. Particularly, the potential of GANs and autoencoders to intuitively generate and refine image descriptors signals promising avenues for future research and applications.

Future developments in this domain will likely focus on enhancing model generalization, improving feature interpretability, and refining retrieval speed. Furthermore, advancements in self-supervised learning are expected to significantly influence CBIR systems, providing avenues to harness unlabeled data effectively. Additionally, addressing challenges of scalability and resource efficiency in deep learning models will be pivotal in driving real-world applications of CBIR systems.

In conclusion, the research provides an insightful exposition on the transition to and triumph of deep learning methodologies in the field of CBIR, paving the way for continued innovation and exploration in extracting meaningful insights from complex image datasets.