Hierarchy-based Image Embeddings for Semantic Image Retrieval (1809.09924v4)

Published 26 Sep 2018 in cs.CV, cs.AI, cs.IR, and cs.LG

Abstract: Deep neural networks trained for classification have been found to learn powerful image representations, which are also often used for other tasks such as comparing images w.r.t. their visual similarity. However, visual similarity does not imply semantic similarity. In order to learn semantically discriminative features, we propose to map images onto class embeddings whose pair-wise dot products correspond to a measure of semantic similarity between classes. Such an embedding does not only improve image retrieval results, but could also facilitate integrating semantics for other tasks, e.g., novelty detection or few-shot learning. We introduce a deterministic algorithm for computing the class centroids directly based on prior world-knowledge encoded in a hierarchy of classes such as WordNet. Experiments on CIFAR-100, NABirds, and ImageNet show that our learned semantic image embeddings improve the semantic consistency of image retrieval results by a large margin.

Authors (2)

Björn Barz (23 papers)
Joachim Denzler (87 papers)

Citations (89)

View on Semantic Scholar

Summary

Hierarchy-based Image Embeddings for Semantic Image Retrieval

The paper "Hierarchy-based Image Embeddings for Semantic Image Retrieval" by Björn Barz and Joachim Denzler presents a novel methodology to enhance the semantic consistency of image retrieval results using a hierarchy-based approach for computing image embeddings. This research aims to bridge the gap between visual and semantic similarity in image representation, which is a prominent challenge in content-based image retrieval (CBIR).

Core Proposition and Methodology

The central proposition of this paper is the introduction of a method for mapping images onto a semantic space derived from a hierarchy of classes, such as those provided by WordNet. This approach involves computing class embeddings such that the pairwise dot products reflect a measure of semantic similarity between classes. The authors describe a deterministic algorithm to compute these embeddings based on class hierarchies and leverage this representation for improved image retrieval.

The algorithm begins by computing class similarities using the concept of the lowest common subsumer (LCS) in a class hierarchy, a measure previously utilized in semantic similarity tasks. The embeddings are constructed by ensuring that the dot product between any pair of class embeddings matches their precomputed similarities. This ensures that classes with higher semantic similarity are closer together in the embedding space. The use of a hypersphere for embedding computations ensures that the representations are normalized.

Empirical Evaluation

The proposed semantic embedding approach is evaluated on well-known datasets such as CIFAR-100, NABirds, and ImageNet. The authors provide extensive experiments that demonstrate significant improvements in semantic image retrieval consistency compared to traditional classification-based embeddings and other embedding learning methods, such as those relying on metric learning losses.

For instance, experiments on CIFAR-100 using the ResNet-110 architecture show notable improvements in the hierarchical precision of retrieval results. The embedding approach substantially outperformed conventional classification methods, achieving a mean Average Hierarchical Precision (mAHP@250) significantly higher than baselines. Similar trends were observed across all tested datasets, reiterating the efficacy of the proposed method.

Implications and Future Developments

This research has notable practical implications for content-based image retrieval systems, which benefit from retrieving semantically relevant images rather than merely visually similar ones. The integration of semantic understanding in image embeddings could also enhance other applications, such as novelty detection and few-shot learning, by enabling a metric space that better aligns with human semantic reasoning.

The methodology advances theoretical understanding in the image retrieval domain by illustrating the potential of hierarchy-based class embeddings. It provides a foundation for exploring more sophisticated embeddings that incorporate external knowledge about class semantics.

Future work could focus on enhancing the computational efficiency of embeddings in large-scale settings and exploring embedding spaces that generalize beyond predefined class hierarchies. Moreover, research in applying these semantic embeddings to other modalities and multi-modal retrieval tasks could present exciting opportunities for furthering cross-domain search capabilities.

Overall, the paper contributes significantly to the field of semantic image retrieval by offering a structured approach to embedding semantic relationships directly into the computational representation of images, thus moving closer to capturing intrinsic image semantics within artificial systems.

PDF Markdown

Related Papers

YouTube

Show All Videos