Simultaneous Feature Learning and Hash Coding with Deep Neural Networks (1504.03410v1)

Published 14 Apr 2015 in cs.CV

Abstract: Similarity-preserving hashing is a widely-used method for nearest neighbour search in large-scale image retrieval tasks. For most existing hashing methods, an image is first encoded as a vector of hand-engineering visual features, followed by another separate projection or quantization step that generates binary codes. However, such visual feature vectors may not be optimally compatible with the coding process, thus producing sub-optimal hashing codes. In this paper, we propose a deep architecture for supervised hashing, in which images are mapped into binary codes via carefully designed deep neural networks. The pipeline of the proposed deep architecture consists of three building blocks: 1) a sub-network with a stack of convolution layers to produce the effective intermediate image features; 2) a divide-and-encode module to divide the intermediate image features into multiple branches, each encoded into one hash bit; and 3) a triplet ranking loss designed to characterize that one image is more similar to the second image than to the third one. Extensive evaluations on several benchmark image datasets show that the proposed simultaneous feature learning and hash coding pipeline brings substantial improvements over other state-of-the-art supervised or unsupervised hashing methods.

PDF Abstract

Simultaneous Feature Learning and Hash Coding with Deep Neural Networks

The paper, "Simultaneous Feature Learning and Hash Coding with Deep Neural Networks," introduces a novel approach for supervised hashing in large-scale image retrieval tasks. The authors, Hanjiang Lai, Yan Pan, Ye Liu, and Shuicheng Yan, propose an integrated deep neural network architecture that simultaneously learns feature representations and hash codes, thereby addressing the inefficiencies of existing methods that rely on separate stages for feature extraction and hash code generation.

Key Contributions

The paper presents several key contributions to the field of similarity-preserving hashing for image retrieval:

Integrated Hashing Architecture: The authors propose a deep learning-based supervised hashing framework that concurrently learns both feature representations and hash codes. This integrated approach contrasts with traditional methods that first utilize hand-crafted visual features followed by subsequent binary coding stages.
Triplet Ranking Loss: The model introduces a triplet ranking loss function, designed to preserve relative similarities among images. This loss function ensures that the learned hash codes maintain the semantic relationships indicated by the training triplets.
Divide-and-Encode Module: The architecture includes a distinct divide-and-encode module. This module divides intermediate image features into multiple branches, each responsible for encoding a specific hash bit, thus reducing redundancy in the hash codes.

Architecture Overview

The deep architecture comprises three main components:

Shared Stacked Convolution Layers: The initial part of the network is a sub-network containing multiple layers of convolutions and pooling operations. This shared sub-network is responsible for capturing effective image representations. Consistent use across triplets (i.e., query image, positive image, and negative image) ensures parameter sharing.
Divide-and-Encode Module: Following the shared sub-network, the divide-and-encode module processes the intermediate feature vectors. Each slice of the feature vector is projected to one hash bit, then passed through a sigmoid activation function and a piece-wise threshold function to approximate binary outputs.
Triplet Ranking Loss: Finally, the triplet-based ranking loss ensures that the hash codes reflect the relative similarities among the images. The relaxed version of this loss function, combined with backpropagation, efficiently trains the network end-to-end.

Evaluation and Results

The paper extensively evaluates the proposed approach on three major benchmarks: SVHN, CIFAR-10, and NUS-WIDE. The results demonstrate substantial improvements over state-of-the-art supervised and unsupervised hashing methods.

SVHN: The proposed method achieves MAP scores of 0.899, 0.914, 0.925, and 0.923 for 12, 24, 32, and 48 bits, respectively. These results significantly outperform other hashing methods, including CNNH, KSH, and ITQ.
CIFAR-10: The method yields MAP scores ranging from 0.552 to 0.581 across different bit lengths, again outperforming competing approaches.
NUS-WIDE: Achieving MAP scores between 0.674 to 0.715, the proposed method shows superior accuracy compared to other methods.

Practical and Theoretical Implications

The proposed deep architecture has significant implications for both practical applications and theoretical developments in image retrieval.

Practical Implications: The efficiency and accuracy of the proposed hashing method can enhance real-world applications, including multimedia search engines and large-scale image databases. The simultaneous learning of feature representations and hash codes within a single framework simplifies the deployment pipeline, reducing computational overhead and improving retrieval speed.
Theoretical Implications: By integrating feature learning and hash coding, the work bridges the gap between representation learning and similarity-preserving hashing. The adoption of triplet ranking loss for supervised hashing opens new avenues for exploring relative similarity preservation in other domains, such as document retrieval and recommendation systems.

Speculations on Future Developments

As deep learning models continue to evolve, it is foreseeable that future hashing methods will leverage even more sophisticated architectures and loss functions. Potential areas for future research may include:

Improved Loss Functions: Developing loss functions that incorporate more complex relationships beyond triplet-based constraints, such as higher-order relationships, could further enhance the quality of hash codes.
Scalability: As datasets grow larger, optimizing the proposed architecture for scalability and efficiency, perhaps through distributed computing frameworks, will be crucial.
Domain Adaptation: Extending the current framework to handle cross-domain hashing where training and testing images come from different distributions could increase the robustness and applicability of the method.

In summary, the comprehensive integration of feature learning and hash coding presented in this paper marks a significant advancement in supervised hashing methods, achieving superior performance metrics and paving the way for more efficient image retrieval systems.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Hanjiang Lai (35 papers)
Yan Pan (48 papers)
Ye Liu (153 papers)
Shuicheng Yan (275 papers)

Citations (807)

View on Semantic Scholar

Simultaneous Feature Learning and Hash Coding with Deep Neural Networks (1504.03410v1)