Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Confidence Multi-View Hashing for Multimedia Retrieval (2312.07327v2)

Published 12 Dec 2023 in cs.CV

Abstract: The multi-view hash method converts heterogeneous data from multiple views into binary hash codes, which is one of the critical technologies in multimedia retrieval. However, the current methods mainly explore the complementarity among multiple views while lacking confidence learning and fusion. Moreover, in practical application scenarios, the single-view data contain redundant noise. To conduct the confidence learning and eliminate unnecessary noise, we propose a novel Adaptive Confidence Multi-View Hashing (ACMVH) method. First, a confidence network is developed to extract useful information from various single-view features and remove noise information. Furthermore, an adaptive confidence multi-view network is employed to measure the confidence of each view and then fuse multi-view features through a weighted summation. Lastly, a dilation network is designed to further enhance the feature representation of the fused features. To the best of our knowledge, we pioneer the application of confidence learning into the field of multimedia retrieval. Extensive experiments on two public datasets show that the proposed ACMVH performs better than state-of-the-art methods (maximum increase of 3.24%). The source code is available at https://github.com/HackerHyper/ACMVH.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. X. Lu, L. Zhu, L. Liu, L. Nie, and H. Zhang, “Graph convolutional multi-modal hashing for flexible multimedia retrieval,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 1414–1422.
  2. M. Welling and T. N. Kipf, “Semi-supervised classification with graph convolutional networks,” in J. International Conference on Learning Representations (ICLR 2017), 2016.
  3. W. Tan, L. Zhu, W. Guan, J. Li, and Z. Cheng, “Bit-aware semantic transformer hashing for multi-modal retrieval,” in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 982–991.
  4. Z. Han, C. Zhang, H. Fu, and J. T. Zhou, “Trusted multi-view classification with dynamic evidential fusion,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 2, pp. 2551–2566, 2022.
  5. Z. Han, F. Yang, J. Huang, C. Zhang, and J. Yao, “Multimodal dynamics: Dynamical fusion for trustworthy multimodal classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 707–20 717.
  6. X. Zheng, C. Tang, Z. Wan, C. Hu, and W. Zhang, “Multi-level confidence learning for trustworthy multimodal classification,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 9, 2023, pp. 11 381–11 389.
  7. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  8. Y. Zhang, R. Jin, and Z.-H. Zhou, “Understanding bag-of-words model: a statistical framework,” International journal of machine learning and cybernetics, vol. 1, no. 1, pp. 43–52, 2010.
  9. M. J. Huiskes and M. S. Lew, “The mir flickr retrieval evaluation,” in Proceedings of the 1st ACM international conference on Multimedia information retrieval, 2008, pp. 39–43.
  10. T.-S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, “Nus-wide: a real-world web image database from national university of singapore,” in Proceedings of the ACM international conference on image and video retrieval, 2009, pp. 1–9.
  11. L. Liu, Z. Zhang, and Z. Huang, “Flexible discrete multi-view hashing with collective latent feature learning,” Neural Processing Letters, vol. 52, no. 3, pp. 1765–1791, 2020.
  12. X. Lu, L. Zhu, Z. Cheng, J. Li, X. Nie, and H. Zhang, “Flexible online multi-modal hashing for large-scale multimedia retrieval,” in Proceedings of the 27th ACM international conference on multimedia, 2019, pp. 1129–1137.
  13. L. Zhu, X. Lu, Z. Cheng, J. Li, and H. Zhang, “Deep collaborative multi-view hashing for large-scale image search,” IEEE Transactions on Image Processing, vol. 29, pp. 4643–4655, 2020.
  14. C. Zheng, L. Zhu, Z. Cheng, J. Li, and A.-A. Liu, “Adaptive partial multi-view hashing for efficient social image retrieval,” IEEE Transactions on Multimedia, vol. 23, pp. 4079–4092, 2020.
Citations (3)

Summary

  • The paper presents an ACMVH method that adaptively fuses features using confidence scores to filter noise and enhance retrieval robustness.
  • It employs specialized networks – backbones, confidence, adaptive fusion, dilation, and a hash layer – with tailored loss functions to optimize binary code generation.
  • Experimental results on MIR-Flickr25K and NUS-WIDE datasets show significant improvements in mean average precision over state-of-the-art methods.

Introduction

The field of multimedia retrieval is important in an increasingly data-driven world, where users search for content that spans multiple types of data, such as text and images. One way to make this process more efficient is through multi-view hashing, which involves transforming data from different sources, or 'views', into a compact binary form, or 'hash code'. This paper introduces a novel approach to multi-view hashing that incorporates confidence learning to better fuse features from different views, leading to more accurate and noise-resistant multimedia retrieval.

Confidence Learning in Hashing

Traditional multi-view hashing methods merge features from different views without sufficiently handling the quality of the individual features, which can lead to integration of noisy or irrelevant data and thus deteriorate retrieval performance. To address this problem, the paper presents an Adaptive Confidence Multi-View Hashing (ACMVH) method, which can discern useful features from each view and assign confidence values representing their reliability. This confidence measure not only helps remove noise but also guides the adaptive fusion process, giving greater weight to more trustworthy features from each view when combining them into a unified representation.

Methodology and Network Architecture

The proposed ACMVH method is composed of several key components:

  1. Backbones: Extract features from different views, such as visual features from images and textual features from document data.
  2. Confidence Networks: Analyze each view's features to filter out noise and extract the most useful information.
  3. Adaptive Confidence Multi-View Network: Fuses features from all views using the learned confidence values to guide the combination process.
  4. Dilation Network: Enhances the semantic representation of the fused features by increasing and subsequently reducing their dimensions.
  5. Hash Layer: Outputs the final binary hash codes.

Loss functions are carefully crafted to reflect the similarity between samples based on their binary hash codes and to align the learned codes with category information, aiding in accurate classification.

Experiments and Results

The ACMVH approach has been extensively tested on two well-known benchmarks: the MIR-Flickr25K and NUS-WIDE datasets. Experimental results show a notable increase in mean average precision over state-of-the-art methods, indicating the effectiveness of ACMVH in multimedia retrieval tasks. Additionally, ablation studies demonstrate the significance of each component of the ACMVH architecture, with confidence networks identified as particularly impactful.

Conclusion

The paper's innovative approach to multi-view hashing successfully integrates confidence learning, resulting in more reliable and discriminative hash codes for multimedia retrieval. By automatically adjusting to the trustworthiness of features from each view, the ACMVH method outperforms existing strategies and offers a promising direction for future work in this domain.

Github Logo Streamline Icon: https://streamlinehq.com