Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Triplet-Based Deep Hashing Network for Cross-Modal Retrieval (1904.02449v1)

Published 4 Apr 2019 in cs.IR

Abstract: Given the benefits of its low storage requirements and high retrieval efficiency, hashing has recently received increasing attention. In particular,cross-modal hashing has been widely and successfully used in multimedia similarity search applications. However, almost all existing methods employing cross-modal hashing cannot obtain powerful hash codes due to their ignoring the relative similarity between heterogeneous data that contains richer semantic information, leading to unsatisfactory retrieval performance. In this paper, we propose a triplet-based deep hashing (TDH) network for cross-modal retrieval. First, we utilize the triplet labels, which describes the relative relationships among three instances as supervision in order to capture more general semantic correlations between cross-modal instances. We then establish a loss function from the inter-modal view and the intra-modal view to boost the discriminative abilities of the hash codes. Finally, graph regularization is introduced into our proposed TDH method to preserve the original semantic similarity between hash codes in Hamming space. Experimental results show that our proposed method outperforms several state-of-the-art approaches on two popular cross-modal datasets.

Triplet-Based Deep Hashing Network for Cross-Modal Retrieval: An Overview

This paper presents a novel approach to cross-modal retrieval leveraging triplet-based deep hashing. The core innovation of this work addresses limitations in existing methods by focusing on the relative similarity of heterogeneous data rather than binary similarity relationships. The proposed Triplet-Based Deep Hashing (TDH) method enhances retrieval performance by integrating feature learning with deep neural networks and distinguishing between modalities in an end-to-end framework.

The authors utilize triplet labels to capture and encode relative semantic similarity between instances from different modalities. This approach is shown to efficiently harness semantic correlations, leading to improved retrieval accuracy. The TDH framework is constituted of several key components: inter-modal triplet loss, intra-modal triplet loss, and graph regularization. The inter-modal triplet loss focuses on reducing the Hamming distance among semantically similar cross-modal data points, while the intra-modal triplet loss improves the discriminative ability of hash codes within each modality. Graph regularization is employed to preserve semantic similarities in the Hamming space.

The empirical results obtained from experiments on MIRFlickr25k and NUS-WIDE datasets demonstrate the superiority of the proposed method over existing state-of-the-art approaches. The paper reports significant performance improvements in Mean Average Precision (MAP), with TDH consistently outperforming comparative methods across various code lengths. The precision-recall and top-N precision metrics further substantiate the effectiveness of the triplet-based architecture, showcasing its capability to capture intricate semantic relationships in large-scale retrieval tasks.

Theoretical advancements in this research not only advance the field of cross-modal retrieval but provide implications for further developments in AI-driven multimedia analysis, where understanding and capturing the semantics of data across diverse formats is pivotal. Future trajectories of this research could explore adaptive sampling techniques for triplet generation and extension to incorporate more modalities, thereby expanding its applicability.

In conclusion, this paper presents a well-substantiated approach that facilitates efficient and scalable retrieval performance by successfully bridging the semantic gap in heterogeneous data environments. The nuanced application of triplet labels enriches semantic correlation handling and sets a precedent for future developments in cross-modal hashing methodologies.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Cheng Deng (67 papers)
  2. Zhaojia Chen (1 paper)
  3. Xianglong Liu (128 papers)
  4. Xinbo Gao (194 papers)
  5. Dacheng Tao (829 papers)
Citations (325)