End-to-End Supervised Multilabel Contrastive Learning (2307.03967v1)
Abstract: Multilabel representation learning is recognized as a challenging problem that can be associated with either label dependencies between object categories or data-related issues such as the inherent imbalance of positive/negative samples. Recent advances address these challenges from model- and data-centric viewpoints. In model-centric, the label correlation is obtained by an external model designs (e.g., graph CNN) to incorporate an inductive bias for training. However, they fail to design an end-to-end training framework, leading to high computational complexity. On the contrary, in data-centric, the realistic nature of the dataset is considered for improving the classification while ignoring the label dependencies. In this paper, we propose a new end-to-end training framework -- dubbed KMCL (Kernel-based Mutlilabel Contrastive Learning) -- to address the shortcomings of both model- and data-centric designs. The KMCL first transforms the embedded features into a mixture of exponential kernels in Gaussian RKHS. It is then followed by encoding an objective loss that is comprised of (a) reconstruction loss to reconstruct kernel representation, (b) asymmetric classification loss to address the inherent imbalance problem, and (c) contrastive loss to capture label correlation. The KMCL models the uncertainty of the feature encoder while maintaining a low computational footprint. Extensive experiments are conducted on image classification tasks to showcase the consistent improvements of KMCL over the SOTA methods. PyTorch implementation is provided in \url{https://github.com/mahdihosseini/KMCL}.
- Structured crowdsourcing enables convolutional segmentation of histology images. Bioinformatics, 35(18):3461–3467, 2019.
- Gaussian mixture variational autoencoder with contrastive learning for multi-label classification. In International Conference on Machine Learning, pages 1383–1398. PMLR, 2022.
- Multi-label classification with partial annotations using class-aware selective loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4764–4772, 2022.
- Pattern recognition and machine learning, volume 4. Springer, 2006.
- Face alignment with kernel density deep neural network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6992–7002, 2019.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Big self-supervised models are strong semi-supervised learners. Advances in Neural Information Processing Systems, 33:22243–22255, 2020.
- Intriguing properties of contrastive losses. Advances in Neural Information Processing Systems, 34:11834–11845, 2021.
- Recurrent attentional reinforcement learning for multi-label image recognition. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Learning semantic-specific graph representation for multi-label image recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 522–531, 2019.
- Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5177–5186, 2019.
- A modified contrastive loss method for face recognition. Pattern Recognition Letters, 125:785–790, 2019.
- Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM international conference on image and video retrieval, pages 1–9, 2009.
- Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.
- On the limits of cross-domain generalization in automated x-ray prediction. In Medical Imaging with Deep Learning, pages 136–155. PMLR, 2020.
- Uncertainty estimation in deep neural networks for dermoscopic image classification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 744–745, 2020.
- Elements of Information Theory 2nd Edition (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, July 2006.
- Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020.
- Multi-label image classification with contrastive learning. arXiv preprint arXiv:2107.11626, 2021.
- Articulated and generalized gaussian kernel correlation for human pose estimation. IEEE Transactions on Image Processing, 25(2):776–789, 2015.
- The pascal visual object classes (voc) challenge. International journal of computer vision, 88:303–338, 2010.
- Probabilistic numeric convolutional neural networks. In International Conference on Learning Representations, 2021.
- Reconsidering co2 emissions from computer vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2311–2317, 2021.
- Arthur Gretton. Introduction to rkhs, and some simple kernel algorithms. Adv. Top. Mach. Learn. Lecture Conducted from University College London, 16:5–3, 2013.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Ralf Herbrich. Learning kernel classifiers: theory and algorithms. MIT press, 2001.
- Atlas of digital pathology: A generalized hierarchical histological tissue type-annotated database for deep learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11747–11756, 2019.
- Simple: Similar pseudo label exploitation for semi-supervised classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15099–15108, 2021.
- A survey on contrastive self-supervised learning. Technologies, 9(1):2, 2021.
- Thomas Kailath. The divergence and bhattacharyya distance measures in signal selection. IEEE transactions on communication technology, 15(1):52–60, 1967.
- Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
- The open images dataset v4. International Journal of Computer Vision, 128(7):1956–1981, 2020.
- Generating multiple hypotheses for 3d human pose estimation with mixture density network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9887–9895, 2019.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Overcoming limitations of mixture density networks: A sampling and fitting framework for multimodal future prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7144–7153, 2019.
- Multi-label contrastive learning for abstract visual reasoning. arXiv preprint arXiv:2012.01944, 2020.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Transferability estimation using bhattacharyya class separability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9172–9182, 2022.
- Probabilistic representations for video contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14711–14721, 2022.
- A new similarity measure using bhattacharyya coefficient for collaborative filtering in sparse data. Knowledge-Based Systems, 82:163–177, 2015.
- Einsum networks: Fast and scalable learning of tractable probabilistic circuits. In International Conference on Machine Learning, pages 7563–7574. PMLR, 2020.
- Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225, 2017.
- Asymmetric loss for multi-label classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 82–91, 2021.
- Tresnet: High performance gpu-dedicated architecture. In proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 1400–1409, 2021.
- Iml-gcn: Improved multi-label graph convolutional network for efficient yet precise image classification. In AAAI-22 Workshop Program-Deep Learning on Graphs: Methods and Applications, 2022.
- Neural bridge sampling for evaluating safety-critical autonomous systems. Advances in Neural Information Processing Systems, 33:6402–6416, 2020.
- Super-convergence: Very fast training of neural networks using large learning rates. In Artificial intelligence and machine learning for multi-domain operations applications, volume 11006, pages 369–386. SPIE, 2019.
- An explicit description of the reproducing kernel hilbert spaces of gaussian rbf kernels. IEEE Transactions on Information Theory, 52(10):4635–4643, 2006.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Mixture dense regression for object detection and human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 13086–13095, 2020.
- A gaussian mixture model layer jointly optimized with discriminative features within a deep neural network architecture. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4270–4274. IEEE, 2015.
- Cnn-rnn: A unified framework for multi-label image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2285–2294, 2016.
- Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2097–2106, 2017.
- Multi-label classification with label graph superimposing. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 12265–12272, 2020.
- Multi-label image recognition by recurrently discovering attentional regions. In Proceedings of the IEEE international conference on computer vision, pages 464–472, 2017.
- Hcp: A flexible cnn framework for multi-label image classification. IEEE transactions on pattern analysis and machine intelligence, 38(9):1901–1907, 2015.
- Max A Woodbury. Inverting modified matrices. Statistical Research Group, 1950.
- An explicit construction of a reproducing gaussian kernel hilbert space. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, volume 5, pages V–V. IEEE, 2006.
- Exploit bounding box annotations for multi-label object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 280–288, 2016.
- Orderless recurrent models for multi-label classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13440–13449, 2020.
- Attention-driven dynamic graph convolutional network for multi-label image recognition. In European conference on computer vision, pages 649–665. Springer, 2020.
- Point cloud instance segmentation using probabilistic embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8883–8892, 2021.
- Improved deep mixture density network for regional wind power probabilistic forecasting. IEEE Transactions on Power Systems, 35(4):2549–2560, 2020.
- Multilabel image classification with regional latent semantic dependencies. IEEE Transactions on Multimedia, 20(10):2801–2813, 2018.
- Use all the labels: A hierarchical multi-label contrastive learning framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16660–16669, 2022.
- Transformer-based dual relation graph for multi-label image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 163–172, 2021.
- Learning spatial regularization with image-level supervisions for multi-label image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5513–5522, 2017.
- Residual attention: A simple but effective method for multi-label recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 184–193, 2021.