Transformer-based Clipped Contrastive Quantization Learning for Unsupervised Image Retrieval (2401.15362v1)
Abstract: Unsupervised image retrieval aims to learn the important visual characteristics without any given level to retrieve the similar images for a given query image. The Convolutional Neural Network (CNN)-based approaches have been extensively exploited with self-supervised contrastive learning for image hashing. However, the existing approaches suffer due to lack of effective utilization of global features by CNNs and biased-ness created by false negative pairs in the contrastive learning. In this paper, we propose a TransClippedCLR model by encoding the global context of an image using Transformer having local context through patch based processing, by generating the hash codes through product quantization and by avoiding the potential false negative pairs through clipped contrastive learning. The proposed model is tested with superior performance for unsupervised image retrieval on benchmark datasets, including CIFAR10, NUS-Wide and Flickr25K, as compared to the recent state-of-the-art deep models. The results using the proposed clipped contrastive learning are greatly improved on all datasets as compared to same backbone network with vanilla contrastive learning.
- S.R. Dubey, “A decade survey of content based image retrieval using deep learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 5, pp. 2687–2704, 2021.
- “Central similarity quantization for efficient image and video retrieval,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 3083–3092.
- “Vision transformer hashing for image retrieval,” in 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2022, pp. 1–6.
- “Learning compact binary descriptors with unsupervised deep neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1183–1192.
- “Contrastive quantization with code memory for unsupervised image retrieval,” in AAAI Conference on Artificial Intelligence, 2022, pp. 2468–2476.
- “Self-supervised product quantization for deep unsupervised image retrieval,” in IEEE International Conference on Computer Vision, 2021, pp. 12085–12094.
- “Semantic structure-based unsupervised deep hashing,” in 27th International Joint Conference on Artificial Intelligence, 2018, pp. 1064–1070.
- “Auto-encoding twin-bottleneck hashing,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2818–2827.
- Y. Li and J.V. Gemert, “Deep unsupervised image hashing by maximizing bit entropy,” in AAAI Conference on Artificial Intelligence, 2021, pp. 2002–2010.
- “Unsupervised hashing with contrastive information bottleneck,” in International Joint Conference on Artificial Intelligence, 2021.
- “Learning deep binary descriptor with multi-quantization,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1183–1192.
- “Learning deep unsupervised binary codes for image retrieval,” in International Joint Conference on Artificial Intelligence, 2018, pp. 613–619.
- “Debiased contrastive learning,” in Advances in Neural Information Processing Systems, 2020, pp. 8765–8775.
- “Effective and efficient negative sampling in metric learning based recommendation,” Information Sciences, vol. 605, pp. 351–365, 2022.
- “Attention is all you need,” Advances in Neural Information Processing Systems, 2017.
- “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
- “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
- “Product quantization for nearest neighbor search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117–128, 2011.
- “Learning multiple layers of features from tiny images,” 2009.
- “Nus-wide: A real-world web image database from national university of singapore,” in ACM Int. Conference on Image and Video Retrieval, 2009.
- “The mir flickr retrieval evaluation,” in ACM SIGMM Int. Conference on Multimedia Information Retrieval, 2008, pp. 39–43.
- “Vit2hash: unsupervised information-preserving hashing,” arXiv preprint arXiv:2201.05541, 2022.
- “Pytorch: An imperative style, high-performance deep learning library,” Advances in Neural Information Processing Systems, vol. 32, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.