Probabilistic Contrastive Learning for Long-Tailed Visual Recognition (2403.06726v2)
Abstract: Long-tailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Such imbalance issue considerably impairs the performance of standard supervised learning algorithms, which are mainly designed for balanced training sets. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. However, the performance of supervised contrastive learning is plagued by an inherent challenge: it necessitates sufficiently large batches of training data to construct contrastive pairs that cover all categories, yet this requirement is difficult to meet in the context of class-imbalanced data. To overcome this obstacle, we propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space, and samples contrastive pairs accordingly. In fact, estimating the distributions of all classes using features in a small batch, particularly for imbalanced data, is not feasible. Our key idea is to introduce a reasonable and simple assumption that the normalized features in contrastive learning follow a mixture of von Mises-Fisher (vMF) distributions on unit space, which brings two-fold benefits. First, the distribution parameters can be estimated using only the first sample moment, which can be efficiently computed in an online manner across different batches. Second, based on the estimated distribution, the vMF distribution allows us to sample an infinite number of contrastive pairs and derive a closed form of the expected contrastive loss for efficient optimization. Our code is available at https://github.com/LeapLabTHU/ProCo.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in NeurIPS, 2017.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
- G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in CVPR, 2017.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in NeurIPS, 2015.
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in CVPR, 2017.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, 2015.
- G. Van Horn, O. Mac Aodha, Y. Song, Y. Cui, C. Sun, A. Shepard, H. Adam, P. Perona, and S. Belongie, “The inaturalist species classification and detection dataset,” in CVPR, 2018.
- F. Graf, C. Hofer, M. Niethammer, and R. Kwitt, “Dissecting supervised constrastive learning,” in ICML, 2021.
- C. Fang, H. He, Q. Long, and W. J. Su, “Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training,” Proceedings of the National Academy of Sciences, 2021.
- Y. Cui, M. Jia, T.-Y. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in CVPR, 2019.
- B. Kang, S. Xie, M. Rohrbach, Z. Yan, A. Gordo, J. Feng, and Y. Kalantidis, “Decoupling representation and classifier for long-tailed recognition,” in ICLR, 2020.
- K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma, “Learning imbalanced datasets with label-distribution-aware margin loss,” in NeurIPS, 2019.
- A. K. Menon, S. Jayasumana, A. S. Rawat, H. Jain, A. Veit, and S. Kumar, “Long-tail learning via logit adjustment,” in ICLR, 2021.
- P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” in NeurIPS, 2020.
- T. Li, P. Cao, Y. Yuan, L. Fan, Y. Yang, R. S. Feris, P. Indyk, and D. Katabi, “Targeted supervised contrastive learning for long-tailed recognition,” in CVPR, 2022.
- J. Zhu, Z. Wang, J. Chen, Y.-P. P. Chen, and Y.-G. Jiang, “Balanced contrastive learning for long-tailed visual recognition,” in CVPR, 2022.
- J. Cui, Z. Zhong, S. Liu, B. Yu, and J. Jia, “Parametric contrastive learning,” in ICCV, 2021.
- P. Wang, K. Han, X.-S. Wei, L. Zhang, and L. Wang, “Contrastive learning based hybrid networks for long-tailed image classification,” in CVPR, 2021.
- B. Kang, Y. Li, S. Xie, Z. Yuan, and J. Feng, “Exploring balanced feature spaces for representation learning,” in ICLR, 2020.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” in NeurIPS, 2014.
- J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, “Deep unsupervised learning using nonequilibrium thermodynamics,” in ICML, 2015.
- J. Guo, C. Du, J. Wang, H. Huang, P. Wan, and G. Huang, “Assessing a single image in reference-guided image synthesis,” in AAAI, 2022.
- Y. Wang, G. Huang, S. Song, X. Pan, Y. Xia, and C. Wu, “Regularizing deep networks with semantic data augmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- S. Li, K. Gong, C. H. Liu, Y. Wang, F. Qiao, and X. Cheng, “MetaSAug: Meta semantic augmentation for long-tailed visual recognition,” in CVPR, 2021.
- Q. Cai, Y. Wang, Y. Pan, T. Yao, and T. Mei, “Joint contrastive learning with infinite possibilities,” in NeurIPS, 2020.
- X. Jia, X.-Y. Jing, X. Zhu, S. Chen, B. Du, Z. Cai, Z. He, and D. Yue, “Semi-supervised multi-view deep discriminant representation learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- T. Miyato, S.-I. Maeda, M. Koyama, and S. Ishii, “Virtual adversarial training: A regularization method for supervised and semi-supervised learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
- D. P. Kingma, S. Mohamed, D. Jimenez Rezende, and M. Welling, “Semi-supervised learning with deep generative models,” in NeurIPS, 2014.
- A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko, “Semi-supervised learning with ladder networks,” in NeurIPS, 2015.
- M. Kubat, S. Matwin et al., “Addressing the curse of imbalanced training sets: one-sided selection,” in ICML, 1997.
- B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos, “Class imbalance, redux,” in International Conference on Data Mining, 2011.
- N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, 2002.
- C. Huang, Y. Li, C. C. Loy, and X. Tang, “Learning deep representation for imbalanced classification,” in CVPR, 2016.
- A. Menon, H. Narasimhan, S. Agarwal, and S. Chawla, “On the statistical consistency of algorithms for binary classification under class imbalance,” in ICML, 2013.
- B. Kim and J. Kim, “Adjusting decision boundary for class imbalanced learning,” IEEE Access, 2020.
- J. Zhang, L. Liu, P. Wang, and C. Shen, “To balance or not to balance: A simple-yet-effective approach for learning with long-tailed distributions,” arXiv preprint, 2019.
- J. Ren, C. Yu, X. Ma, H. Zhao, and S. Yi, “Balanced meta-softmax for long-tailed visual recognition,” in NeurIPS, 2020.
- J. Tan, C. Wang, B. Li, Q. Li, W. Ouyang, C. Yin, and J. Yan, “Equalization loss for long-tailed object recognition,” in CVPR, 2020.
- P. Chu, X. Bian, S. Liu, and H. Ling, “Feature space augmentation for long-tailed data,” in ECCV, 2020.
- Y. Zang, C. Huang, and C. C. Loy, “FASA: Feature augmentation and sampling adaptation for long-tailed instance segmentation,” in ICCV, 2021.
- Y. Wang, Z. Ni, S. Song, L. Yang, and G. Huang, “Revisiting locally supervised learning: an alternative to end-to-end training,” in ICLR, 2021.
- Y. Wang, Y. Yue, R. Lu, T. Liu, Z. Zhong, S. Song, and G. Huang, “Efficienttrain: Exploring generalized curriculum learning for training visual backbones,” in ICCV, 2023.
- G. Huang, Y. Wang, K. Lv, H. Jiang, W. Huang, P. Qi, and S. Song, “Glance and focus networks for dynamic visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4605–4621, 2022.
- X. Zhe, S. Chen, and H. Yan, “Directional statistics-based deep metric learning for image classification and retrieval,” Pattern Recognition, 2019.
- K. Roth, O. Vinyals, and Z. Akata, “Non-isotropy regularization for proxy-based deep metric learning,” in CVPR, 2022.
- T. R. Scott, A. C. Gallagher, and M. C. Mozer, “von Mises-Fisher loss: An exploration of embedding geometries for supervised learning,” in ICCV, 2021.
- S. Li, J. Xu, X. Xu, P. Shen, S. Li, and B. Hooi, “Spherical confidence learning for face recognition,” in CVPR, 2021.
- A. Banerjee, I. S. Dhillon, J. Ghosh, S. Sra, and G. Ridgeway, “Clustering on the Unit Hypersphere using von Mises-Fisher Distributions.” Journal of Machine Learning Research, 2005.
- H. Wang, S. Fu, X. He, H. Fang, Z. Liu, and H. Hu, “Towards calibrated hyper-sphere representation via distribution overlap coefficient for long-tailed learning,” in ECCV, 2022.
- C. Ting, K. Simon, N. Mohammad, and H. Geoffrey, “A simple framework for contrastive learning of visual representations,” in ICML, 2020.
- G. Jean-Bastien, S. Florian, A. Florent, T. Corentin, P. R. H., B. Elena, D. Carl, B. P. Avila, Z. G. Daniel, M. A. Gheshlaghi, P. Bilal, K. Koray, M. Rémi, and V. Michal, “Bootstrap Your Own Latent- a new approach to self-supervised learning,” in NeurIPS, 2020.
- Y. Tian, D. Krishnan, and P. Isola, “Contrastive multiview coding,” in ECCV, 2020.
- H. Kaiming, F. Haoqi, W. Yuxin, X. Saining, and G. Ross, “Momentum contrast for unsupervised visual representation learning,” CVPR, 2019.
- D. Samuel and G. Chechik, “Distributional robustness loss for long-tail learning,” in ICCV, 2021.
- J. Cui, Z. Zhong, Z. Tian, S. Liu, B. Yu, and J. Jia, “Generalized parametric contrastive learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in ICML, 2021.
- C. Tian, W. Wang, X. Zhu, J. Dai, and Y. Qiao, “Vl-ltr: Learning class-wise visual-linguistic representation for long-tailed visual recognition,” in ECCV, 2022.
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint, 2015.
- L. Xiang, G. Ding, and J. Han, “Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification,” in ECCV, 2020.
- X. Wang, L. Lian, Z. Miao, Z. Liu, and S. X. Yu, “Long-tailed recognition by routing diverse distribution-aware experts,” in ICLR, 2020.
- Y.-Y. He, J. Wu, and X.-S. Wei, “Distilling virtual examples for long-tailed recognition,” in ICCV, 2021, pp. 235–244.
- J. Li, Z. Tan, J. Wan, Z. Lei, and G. Guo, “Nested collaborative learning for long-tailed visual recognition,” in CVPR, 2022.
- B. Zhu, Y. Niu, X.-S. Hua, and H. Zhang, “Cross-domain empirical risk minimization for unbiased long-tailed classification,” in AAAI, 2022.
- G. Huang and C. Du, “The High Separation Probability Assumption for Semi-Supervised Learning,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022.
- K. Sohn, D. Berthelot, N. Carlini, Z. Zhang, H. Zhang, C. A. Raffel, E. D. Cubuk, A. Kurakin, and C.-L. Li, “FixMatch: Simplifying semi-supervised learning with consistency and confidence,” in NeurIPS, 2020.
- J. Kim, Y. Hur, S. Park, E. Yang, S. J. Hwang, and J. Shin, “Distribution aligning refinery of pseudo-label for imbalanced semi-supervised learning,” in NeurIPS, 2020.
- C. Wei, K. Sohn, C. Mellina, A. Yuille, and F. Yang, “CReST: A class-rebalancing self-training framework for imbalanced semi-supervised learning,” in CVPR, 2021.
- Y. Oh, D.-J. Kim, and I. S. Kweon, “DASO: Distribution-aware semantics-oriented pseudo-label for imbalanced semi-supervised learning,” in CVPR, 2022.
- J. L. W. V. Jensen, “Sur les fonctions convexes et les inégalités entre les valeurs moyennes,” Acta mathematica, 1906.
- S. Sra, “A short note on parameter approximation for von mises-fisher distributions: and a fast implementation of is(x)subscript𝑖𝑠𝑥i_{s}(x)italic_i start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_x ),” Computational Statistics, 2012.
- W. Jitkrittum, A. K. Menon, A. S. Rawat, and S. Kumar, “ELM: Embedding and logit margins for long-tail learning,” arXiv preprint, 2022.
- Z. Hou, B. Yu, and D. Tao, “Batchformer: Learning to explore sample relationships for robust representation learning,” in CVPR, 2022.
- H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, “CosFace: Large margin cosine loss for deep face recognition,” in CVPR, 2018.
- Z. Liu, Z. Miao, X. Zhan, J. Wang, B. Gong, and S. X. Yu, “Open long-tailed recognition in a dynamic world,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- J. Wang, W. Zhang, Y. Zang, Y. Cao, J. Pang, T. Gong, K. Chen, Z. Liu, C. C. Loy, and D. Lin, “Seesaw loss for long-tailed instance segmentation,” in CVPR, 2021.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
- C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The caltech-ucsd birds-200-2011 dataset,” 2011.
- A. Gupta, P. Dollar, and R. Girshick, “Lvis: A dataset for large vocabulary instance segmentation,” in CVPR, 2019, pp. 5356–5364.
- B. Zhou, Q. Cui, X.-S. Wei, and Z.-M. Chen, “BBN: Bilateral-branch network with cumulative learning for long-tailed visual recognition,” in CVPR, 2020.
- Y. Yang and Z. Xu, “Rethinking the value of labels for improving class-imbalanced learning,” in NeurIPS, 2020.
- K. Tang, J. Huang, and H. Zhang, “Long-tailed classification by keeping the good and removing the bad momentum causal effect,” in NeurIPS, 2020.
- J. Cui, S. Liu, Z. Tian, Z. Zhong, and J. Jia, “ResLT: Residual learning for long-tailed recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, “AutoAugment: Learning augmentation strategies from data,” in CVPR, 2019.
- T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural networks with cutout,” arXiv preprint, 2017.
- E. D. Cubuk, B. Zoph, J. Shlens, and Q. V. Le, “RandAugment: Practical automated data augmentation with a reduced search space,” in CVPR Workshops, 2020.
- S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in CVPR, 2017.
- S. Zhang, Z. Li, S. Yan, X. He, and J. Sun, “Distribution alignment: A unified framework for long-tail visual recognition,” in CVPR, 2021.
- T. Li, L. Wang, and G. Wu, “Self supervision to distillation for long-tailed visual recognition,” in ICCV, 2021.
- A. M. H. Tiong, J. Li, G. Lin, B. Li, C. Xiong, and S. C. Hoi, “Improving tail-class representation with centroid contrastive learning,” Pattern Recognition Letters, 2023.
- Y. L. Mengke Li, Yiu-ming Cheung, “Long-tailed visual recognition via gaussian clouded logit adjustment,” in CVPR, 2022.
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in ICCV, 2017, pp. 2961–2969.
- K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Xu, Z. Zhang, D. Cheng, C. Zhu, T. Cheng, Q. Zhao, B. Li, X. Lu, R. Zhu, Y. Wu, J. Dai, J. Wang, J. Shi, W. Ouyang, C. C. Loy, and D. Lin, “MMDetection: Open mmlab detection toolbox and benchmark,” arXiv preprint, 2019.
- A. Maurer and M. Pontil, “Empirical bernstein bounds and sample-variance penalization,” in Annual Conference Computational Learning Theory, 2009.