Unifying Deep Local and Global Features for Image Search (2001.05027v4)

Published 14 Jan 2020 in cs.CV

Abstract: Image retrieval is the problem of searching an image database for items that are similar to a query image. To address this task, two main types of image representations have been studied: global and local image features. In this work, our key contribution is to unify global and local features into a single deep model, enabling accurate retrieval with efficient feature extraction. We refer to the new model as DELG, standing for DEep Local and Global features. We leverage lessons from recent feature learning work and propose a model that combines generalized mean pooling for global features and attentive selection for local features. The entire network can be learned end-to-end by carefully balancing the gradient flow between two heads -- requiring only image-level labels. We also introduce an autoencoder-based dimensionality reduction technique for local features, which is integrated into the model, improving training efficiency and matching performance. Comprehensive experiments show that our model achieves state-of-the-art image retrieval on the Revisited Oxford and Paris datasets, and state-of-the-art single-model instance-level recognition on the Google Landmarks dataset v2. Code and models are available at https://github.com/tensorflow/models/tree/master/research/delf .

Citations (22)

View on Semantic Scholar

Summary

The paper introduces the DELG model that unifies deep local and global feature learning within a single CNN for comprehensive image representations.
It employs generalized mean pooling, attentive selection, and a gradient control mechanism to efficiently balance local and global feature extraction using only image-level labels.
Experimental results on benchmark datasets demonstrate state-of-the-art performance and reduced latency compared to separate feature extraction systems.

Unifying Deep Local and Global Features for Image Search

In this paper, the authors address the challenge of creating a unified deep learning model for image retrieval that efficiently incorporates both local and global image features. To achieve this, they introduce the DEep Local and Global (DELG) features model, which integrates these two feature types into a single convolutional neural network (CNN) framework.

Methodology

The proposed DELG model combines recent advancements in feature learning such as generalized mean pooling for global features and attentive selection for local features. The approach leverages the hierarchical representations inherent in CNNs to simultaneously extract global and local features from different layers, allowing the model to focus on holistic image representation while retaining region-specific details.

A critical aspect of the model is its ability to be trained end-to-end using only image-level labels, which simplifies the training process. To manage the trade-off between supporting global and local feature learning within the CNN, the authors implement a gradient control mechanism that prevents disruption of desired feature representations in the hierarchical structure. This is accomplished by stopping gradient back-propagation from the local feature learning heads to the network backbone.

Additionally, the authors introduce an autoencoder-based dimensionality reduction technique specific to local features. This method bypasses traditional PCA-based post-processing, allowing compact feature representation without additional learning stages.

Experimental Results

The DELG model is evaluated on several standard image retrieval datasets including the Revisited Oxford and Paris benchmarks. It achieves state-of-the-art results, outperforming previous models that separately handle local and global features. For global-only retrieval, DELG demonstrates substantial improvements in mean average precision (mAP) on large-scale databases. With local feature re-ranking, further performance gains are realized, confirming the precision benefits of local feature matching.

The model’s efficacy is further validated on the Google Landmarks dataset for instance-level recognition, where DELG outperforms existing single-model solutions. The authors provide an analysis of memory and computation trade-offs, demonstrating that the unified model reduces latency compared to separate feature extraction systems while maintaining competitive memory usage through local feature quantization.

Implications and Future Directions

This research has significant implications for developing efficient and robust image retrieval systems. The DELG model’s ability to unify feature extraction offers potential for streamlined, integrated solutions in various computer vision tasks, beyond just image retrieval.

The novel dimension reduction technique and gradient control strategies open pathways for further exploration in hierarchical feature learning. Future research could expand on optimizing quantization methods to further alleviate memory constraints, as well as exploring the model’s applicability to other domains requiring precise image analysis, such as object detection and scene understanding.

Overall, this work provides an effective approach for combining global and local image analysis within a singular, coherent framework, setting a foundation for future advancements in the field.

Unifying Deep Local and Global Features for Image Search (2001.05027v4)

Summary

Unifying Deep Local and Global Features for Image Search

Methodology

Experimental Results

Implications and Future Directions

GitHub

YouTube

Unifying Deep Local and Global Features for Image Search (2001.05027v4)

Summary

Unifying Deep Local and Global Features for Image Search

Methodology

Experimental Results

Implications and Future Directions

Related Papers

GitHub

YouTube