CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion (2402.14551v2)
Abstract: State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been demonstrated that CE can compromise model generalization and stability. While recent works employing contrastive learning address some of these limitations by enhancing the quality of embeddings and producing better decision boundaries, they often overlook the importance of hard negative mining and rely on resource intensive and slow training using large sample batches. To counter these issues, we introduce a novel approach named CLCE, which integrates Label-Aware Contrastive Learning with CE. Our approach not only maintains the strengths of both loss functions but also leverages hard negative mining in a synergistic way to enhance performance. Experimental results demonstrate that CLCE significantly outperforms CE in Top-1 accuracy across twelve benchmarks, achieving gains of up to 3.52% in few-shot learning scenarios and 3.41% in transfer learning settings with the BEiT-3 model. Importantly, our proposed CLCE approach effectively mitigates the dependency of contrastive learning on large batch sizes such as 4096 samples per batch, a limitation that has previously constrained the application of contrastive learning in budget-limited hardware environments.
- Meta-learning with differentiable closed-form solvers. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- Learning imbalanced datasets with label-distribution-aware margin loss. CoRR, abs/1906.07413, 2019.
- A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 1597–1607. PMLR, 2020.
- Cuco: Graph representation with curriculum contrastive learning. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021, pages 2300–2306. ijcai.org, 2021.
- Debiased contrastive learning. CoRR, abs/2007.00224, 2020.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 248–255. IEEE Computer Society, 2009.
- A baseline for few-shot image classification. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
- An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Large margin deep networks for classification. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 850–860, 2018.
- Deep metric learning with hierarchical triplet loss. CoRR, abs/1810.06951, 2018.
- Caltech-256 Object Category Dataset. Mar 2007.
- Supervised contrastive learning for pre-trained language model fine-tuning. CoRR, abs/2011.01403, 2020.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 770–778. IEEE Computer Society, 2016.
- Masked autoencoders are scalable vision learners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 15979–15988. IEEE, 2022.
- Rethinking generalization in few-shot classification. In NeurIPS, 2022.
- Distilling the knowledge in a neural network. CoRR, abs/1503.02531, 2015.
- The inaturalist species classification and detection dataset. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 8769–8778. Computer Vision Foundation / IEEE Computer Society, 2018.
- Adaptive dimension reduction and variational inference for transductive few-shot classification. In International Conference on Artificial Intelligence and Statistics, 25-27 April 2023, Palau de Congressos, Valencia, Spain, volume 206 of Proceedings of Machine Learning Research, pages 5899–5917. PMLR, 2023.
- Supervised contrastive learning with hard negative samples. CoRR, abs/2209.00078, 2022.
- Supervised contrastive learning. Advances in Neural Information Processing Systems, 33:18661–18673, 2020.
- Learning multiple layers of features from tiny images. 2009.
- Smart mining for deep metric learning. CoRR, abs/1704.01285, 2017.
- Self-guided hard negative generation for unsupervised person re-identification. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 1067–1073. ijcai.org, 2022.
- Large-margin softmax loss for convolutional neural networks. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 507–516. JMLR.org, 2016.
- Automated crisis content categorization for covid-19 tweet streams. In 18th International Conference on Information Systems for Crisis Response and Management, pages 667–678.
- Is multi-modal data key for crisis content categorization on social media? In 19th International Conference on Information Systems for Crisis Response and Management (ISCRAM 2022).
- Multiway-adapater: Adapting large-scale multi-modal models for scalable image-text retrieval. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024).
- Robollm: Robotic vision tasks grounded on multimodal large language models. In IEEE International Conference on Robotics and Automation (ICRA 2024).
- Lacvit: A label-aware contrastive fine-tuning framework for vision transformers. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024).
- Crisisvit: A robust vision transformer for crisis image classification. In 20th International Conference on Information Systems for Crisis Response and Management (ISCRAM 2023).
- When hard negative sampling meets supervised contrastive learning. arXiv preprint arXiv:2308.14893, 2023.
- Elucidating and overcoming the challenges of label noise in supervised contrastive learning. arXiv preprint arXiv:2311.16481, 2023.
- On the stability of fine-tuning BERT: misconceptions, explanations, and strong baselines. CoRR, abs/2006.04884, 2020.
- Cross-entropy loss and low-rank features have responsibility for adversarial examples. CoRR, abs/1901.08360, 2019.
- Automated flower classification over a large number of classes. In Sixth Indian Conference on Computer Vision, Graphics & Image Processing, ICVGIP 2008, Bhubaneswar, India, 16-19 December 2008, pages 722–729. IEEE Computer Society, 2008.
- Dinov2: Learning robust visual features without supervision. ArXiv, abs/2304.07193, 2023.
- TADAM: task dependent adaptive metric for improved few-shot learning. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 719–729, 2018.
- Cats and dogs. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012, pages 3498–3505. IEEE Computer Society, 2012.
- Optimization as a model for few-shot learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
- Meta-learning for semi-supervised few-shot classification. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- Contrastive learning with hard negative samples. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Kihyuk Sohn. Improved deep metric learning with multi-class n-pair loss objective. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- Deep metric learning via lifted structured feature embedding. CoRR, abs/1511.06452, 2015.
- Stochastic class-based hard example mining for deep metric learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7251–7259, 2019.
- Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015.
- Rethinking few-shot image classification: A good embedding is all you need? In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XIV, volume 12359 of Lecture Notes in Computer Science, pages 266–282. Springer, 2020.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Matching networks for one shot learning. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 3630–3638, 2016.
- The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
- Image as a foreign language: Beit pretraining for all vision and vision-language tasks. CoRR, abs/2208.10442, 2022.
- Rethinking infonce: How many negative samples do you need? In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 2509–2515. ijcai.org, 2022.
- Billion-scale semi-supervised learning for image classification. CoRR, abs/1905.00546, 2019.
- Trading hard negatives and true negatives: A debiased contrastive collaborative filtering approach. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 2355–2361. ijcai.org, 2022.
- Large multi-modal encoders for recommendation. arXiv preprint arXiv:2310.20343, 2023.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. CoRR, abs/1905.04899, 2019.
- mixup: Beyond empirical risk minimization. CoRR, abs/1710.09412, 2017.
- Revisiting few-sample BERT fine-tuning. CoRR, abs/2006.05987, 2020.
- Shallow bayesian meta learning for real-world few-shot recognition. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 631–640. IEEE, 2021.
- Graph debiased contrastive learning with joint representation clustering. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021, pages 3434–3440. ijcai.org, 2021.
- Learning deep features for scene recognition using places database. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 487–495, 2014.
- Zijun Long (11 papers)
- George Killick (7 papers)
- Lipeng Zhuang (5 papers)
- Gerardo Aragon-Camarasa (21 papers)
- Zaiqiao Meng (42 papers)
- Richard Mccreadie (19 papers)