Adaptive Confidence Multi-View Hashing for Multimedia Retrieval (2312.07327v2)

Published 12 Dec 2023 in cs.CV

Abstract: The multi-view hash method converts heterogeneous data from multiple views into binary hash codes, which is one of the critical technologies in multimedia retrieval. However, the current methods mainly explore the complementarity among multiple views while lacking confidence learning and fusion. Moreover, in practical application scenarios, the single-view data contain redundant noise. To conduct the confidence learning and eliminate unnecessary noise, we propose a novel Adaptive Confidence Multi-View Hashing (ACMVH) method. First, a confidence network is developed to extract useful information from various single-view features and remove noise information. Furthermore, an adaptive confidence multi-view network is employed to measure the confidence of each view and then fuse multi-view features through a weighted summation. Lastly, a dilation network is designed to further enhance the feature representation of the fused features. To the best of our knowledge, we pioneer the application of confidence learning into the field of multimedia retrieval. Extensive experiments on two public datasets show that the proposed ACMVH performs better than state-of-the-art methods (maximum increase of 3.24%). The source code is available at https://github.com/HackerHyper/ACMVH.

References (14)

Citations (3)

View on Semantic Scholar

Summary

The paper presents an ACMVH method that adaptively fuses features using confidence scores to filter noise and enhance retrieval robustness.
It employs specialized networks – backbones, confidence, adaptive fusion, dilation, and a hash layer – with tailored loss functions to optimize binary code generation.
Experimental results on MIR-Flickr25K and NUS-WIDE datasets show significant improvements in mean average precision over state-of-the-art methods.

Introduction

The field of multimedia retrieval is important in an increasingly data-driven world, where users search for content that spans multiple types of data, such as text and images. One way to make this process more efficient is through multi-view hashing, which involves transforming data from different sources, or 'views', into a compact binary form, or 'hash code'. This paper introduces a novel approach to multi-view hashing that incorporates confidence learning to better fuse features from different views, leading to more accurate and noise-resistant multimedia retrieval.

Confidence Learning in Hashing

Traditional multi-view hashing methods merge features from different views without sufficiently handling the quality of the individual features, which can lead to integration of noisy or irrelevant data and thus deteriorate retrieval performance. To address this problem, the paper presents an Adaptive Confidence Multi-View Hashing (ACMVH) method, which can discern useful features from each view and assign confidence values representing their reliability. This confidence measure not only helps remove noise but also guides the adaptive fusion process, giving greater weight to more trustworthy features from each view when combining them into a unified representation.

Methodology and Network Architecture

The proposed ACMVH method is composed of several key components:

Backbones: Extract features from different views, such as visual features from images and textual features from document data.
Confidence Networks: Analyze each view's features to filter out noise and extract the most useful information.
Adaptive Confidence Multi-View Network: Fuses features from all views using the learned confidence values to guide the combination process.
Dilation Network: Enhances the semantic representation of the fused features by increasing and subsequently reducing their dimensions.
Hash Layer: Outputs the final binary hash codes.

Loss functions are carefully crafted to reflect the similarity between samples based on their binary hash codes and to align the learned codes with category information, aiding in accurate classification.

Experiments and Results

The ACMVH approach has been extensively tested on two well-known benchmarks: the MIR-Flickr25K and NUS-WIDE datasets. Experimental results show a notable increase in mean average precision over state-of-the-art methods, indicating the effectiveness of ACMVH in multimedia retrieval tasks. Additionally, ablation studies demonstrate the significance of each component of the ACMVH architecture, with confidence networks identified as particularly impactful.

Conclusion

The paper's innovative approach to multi-view hashing successfully integrates confidence learning, resulting in more reliable and discriminative hash codes for multimedia retrieval. By automatically adjusting to the trustworthiness of features from each view, the ACMVH method outperforms existing strategies and offers a promising direction for future work in this domain.

PDF Markdown

Related Papers

GitHub

GitHub - HackerHyper/ACMVH: Adaptive Confidence Multi-View Hashing (23 stars)