Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ProbMCL: Simple Probabilistic Contrastive Learning for Multi-label Visual Classification (2401.01448v2)

Published 2 Jan 2024 in cs.CV and cs.LG

Abstract: Multi-label image classification presents a challenging task in many domains, including computer vision and medical imaging. Recent advancements have introduced graph-based and transformer-based methods to improve performance and capture label dependencies. However, these methods often include complex modules that entail heavy computation and lack interpretability. In this paper, we propose Probabilistic Multi-label Contrastive Learning (ProbMCL), a novel framework to address these challenges in multi-label image classification tasks. Our simple yet effective approach employs supervised contrastive learning, in which samples that share enough labels with an anchor image based on a decision threshold are introduced as a positive set. This structure captures label dependencies by pulling positive pair embeddings together and pushing away negative samples that fall below the threshold. We enhance representation learning by incorporating a mixture density network into contrastive learning and generating Gaussian mixture distributions to explore the epistemic uncertainty of the feature encoder. We validate the effectiveness of our framework through experimentation with datasets from the computer vision and medical imaging domains. Our method outperforms the existing state-of-the-art methods while achieving a low computational footprint on both datasets. Visualization analyses also demonstrate that ProbMCL-learned classifiers maintain a meaningful semantic topology.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  2. A. Sajedi, S. Khaki, E. Amjadian, L. Z. Liu, Y. A. Lawryshyn, and K. N. Plataniotis, “Datadam: Efficient dataset distillation with attention matching,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 097–17 107.
  3. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning.   PMLR, 2019, pp. 6105–6114.
  4. A. Sajedi, Y. A. Lawryshyn, and K. N. Plataniotis, “Subclass knowledge distillation with known subclass labels,” in 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP).   IEEE, 2022, pp. 1–5.
  5. S. Khaki and W. Luo, “Cfdp: Common frequency domain pruning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2023, pp. 4715–4724.
  6. H. Amer, A. H. Salamah, A. Sajedi, and E.-h. Yang, “High performance convolution using sparsity and patterns for inference in deep convolutional neural networks,” arXiv preprint arXiv:2104.08314, 2021.
  7. A. Sajedi and K. N. Plataniotis, “On the efficiency of subclass knowledge distillation in classification tasks,” arXiv preprint arXiv:2109.05587, 2021.
  8. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision.   Springer, 2014, pp. 740–755.
  9. Z.-M. Chen, X.-S. Wei, P. Wang, and Y. Guo, “Multi-label image recognition with graph convolutional networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5177–5186.
  10. K. Zhu and J. Wu, “Residual attention: A simple but effective method for multi-label recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 184–193.
  11. M. S. Hosseini, L. Chan, G. Tse, M. Tang, J. Deng, S. Norouzi, C. Rowsell, K. N. Plataniotis, and S. Damaskinos, “Atlas of digital pathology: A generalized hierarchical histological tissue type-annotated database for deep learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 747–11 756.
  12. A. Sajedi, S. Khaki, K. N. Plataniotis, and M. S. Hosseini, “End-to-end supervised multilabel contrastive learning,” arXiv preprint arXiv:2307.03967, 2023.
  13. J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu, “Cnn-rnn: A unified framework for multi-label image classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2285–2294.
  14. Y. Wang, D. He, F. Li, X. Long, Z. Zhou, J. Ma, and S. Wen, “Multi-label classification with label graph superimposing,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 12 265–12 272.
  15. J. Zhao, K. Yan, Y. Zhao, X. Guo, F. Huang, and J. Li, “Transformer-based dual relation graph for multi-label image recognition,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 163–172.
  16. T. Ridnik, E. Ben-Baruch, N. Zamir, A. Noy, I. Friedman, M. Protter, and L. Zelnik-Manor, “Asymmetric loss for multi-label classification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 82–91.
  17. A. Sajedi, Y. A. Lawryshyn, and K. N. Plataniotis, “A new probabilistic distance metric with application in gaussian mixture reduction,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2023, pp. 1–5.
  18. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning.   PMLR, 2020, pp. 1597–1607.
  19. P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” Advances in neural information processing systems, vol. 33, pp. 18 661–18 673, 2020.
  20. A. Varamesh and T. Tuytelaars, “Mixture dense regression for object detection and human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 13 086–13 095.
  21. T. Wang and P. Isola, “Understanding contrastive representation learning through alignment and uniformity on the hypersphere,” in International Conference on Machine Learning.   PMLR, 2020, pp. 9929–9939.
  22. S. Zhang, R. Xu, C. Xiong, and C. Ramaiah, “Use all the labels: A hierarchical multi-label contrastive learning framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16 660–16 669.
  23. A. Bhattacharyya, “On a measure of divergence between two multinomial populations,” Sankhyā: the indian journal of statistics, pp. 401–406, 1946.
  24. K. Kampa, E. Hasanbelliu, and J. C. Principe, “Closed-form cauchy-schwarz pdf divergence for mixture of gaussians,” in The 2011 International Joint Conference on Neural Networks.   IEEE, 2011, pp. 2578–2585.
  25. Y. R. Wang, S. Khaki, W. Zheng, M. S. Hosseini, and K. N. Plataniotis, “Conetv2: Efficient auto-channel size optimization for cnns,” in 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), 2021, pp. 998–1003.
  26. T. Ridnik, H. Lawen, A. Noy, E. Ben Baruch, G. Sharir, and I. Friedman, “Tresnet: High performance gpu-dedicated architecture,” in proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021, pp. 1400–1409.
  27. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  28. L. N. Smith and N. Topin, “Super-convergence: Very fast training of neural networks using large learning rates,” in Artificial intelligence and machine learning for multi-domain operations applications, vol. 11006.   SPIE, 2019, pp. 369–386.
  29. F. Zhu, H. Li, W. Ouyang, N. Yu, and X. Wang, “Learning spatial regularization with image-level supervisions for multi-label image classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 5513–5522.
  30. V. O. Yazici, A. Gonzalez-Garcia, A. Ramisa, B. Twardowski, and J. v. d. Weijer, “Orderless recurrent models for multi-label classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13 440–13 449.
  31. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626.
Citations (2)

Summary

We haven't generated a summary for this paper yet.