Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Cross-Modal Hashing (1602.02255v2)

Published 6 Feb 2016 in cs.IR

Abstract: Due to its low storage cost and fast query speed, cross-modal hashing (CMH) has been widely used for similarity search in multimedia retrieval applications. However, almost all existing CMH methods are based on hand-crafted features which might not be optimally compatible with the hash-code learning procedure. As a result, existing CMH methods with handcrafted features may not achieve satisfactory performance. In this paper, we propose a novel cross-modal hashing method, called deep crossmodal hashing (DCMH), by integrating feature learning and hash-code learning into the same framework. DCMH is an end-to-end learning framework with deep neural networks, one for each modality, to perform feature learning from scratch. Experiments on two real datasets with text-image modalities show that DCMH can outperform other baselines to achieve the state-of-the-art performance in cross-modal retrieval applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Qing-Yuan Jiang (12 papers)
  2. Wu-Jun Li (57 papers)
Citations (614)

Summary

Deep Cross-Modal Hashing (DCMH) for Multimedia Retrieval

The paper presents a sophisticated approach to cross-modal retrieval by introducing Deep Cross-Modal Hashing (DCMH). This method integrates feature learning and hash-code learning into a unified deep learning framework. The innovative aspect of DCMH lies in its capacity to simultaneously learn both features and hash codes, optimizing retrieval tasks across different modalities such as images and text.

Key Contributions

  1. End-to-End Framework: DCMH is structured as an end-to-end learning system comprising deep neural networks dedicated to each modality (e.g., text and image). This design allows for comprehensive feature extraction directly from raw input data, bypassing the limitations of hand-crafted features often used in previous models.
  2. Discrete Hash Code Learning: Unlike traditional approaches that relax the discrete optimization problem into a continuous one—which can compromise hash code accuracy—the DCMH model excels by directly learning discrete hash codes. This method avoids potential deterioration in retrieval performance due to relaxation.
  3. Experimental Validation: The authors conduct experiments with the MIRFLICKR-25K and NUS-WIDE datasets, demonstrating that DCMH consistently achieves higher retrieval performance compared to established baselines like SePH, STMH, and SCM. The model's efficacy is validated through mean average precision (MAP) and precision-recall metrics.

Numerical Results and Claims

The authors provide substantial numerical results, with MAP scores indicating that DCMH outperforms other baseline models across various bit lengths. For instance, on the MIRFLICKR-25K dataset with image-to-text queries, DCMH achieves a MAP of 0.7504 for 16-bit codes, outperforming the next best model, SePH, which records a MAP of 0.6441. Such outcomes underscore the strong retrieval capabilities of the proposed framework.

Implications for AI and Future Directions

The implications of this work are manifold, offering advancements in multimedia retrieval applications where multi-modality data are prevalent. By aligning with the burgeoning capabilities of deep learning, DCMH exemplifies a promising trajectory for reducing storage costs and enhancing retrieval speed through efficient hashing.

Theoretically, this integration of feature and hash-code learning within a singular framework suggests potential expansions into more complex retrieval tasks across various domains. Practically, as multi-modal data becomes increasingly common, DCMH could serve as a foundation for future AI systems requiring efficient data indexing and retrieval from massive datasets.

Future research could explore extending this method to handle more than two modalities simultaneously, fostering broader applications in fields like autonomous vehicles and large-scale surveillance systems, where diverse data streams need to be integrated and queried efficiently.

In conclusion, the presented work offers a compelling and methodologically rigorous enhancement to cross-modal retrieval, setting a high standard for future explorations in hash-based data retrieval technologies.