Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Contrastive Credibility Propagation for Reliable Semi-Supervised Learning (2211.09929v4)

Published 17 Nov 2022 in cs.LG

Abstract: Producing labels for unlabeled data is error-prone, making semi-supervised learning (SSL) troublesome. Often, little is known about when and why an algorithm fails to outperform a supervised baseline. Using benchmark datasets, we craft five common real-world SSL data scenarios: few-label, open-set, noisy-label, and class distribution imbalance/misalignment in the labeled and unlabeled sets. We propose a novel algorithm called Contrastive Credibility Propagation (CCP) for deep SSL via iterative transductive pseudo-label refinement. CCP unifies semi-supervised learning and noisy label learning for the goal of reliably outperforming a supervised baseline in any data scenario. Compared to prior methods which focus on a subset of scenarios, CCP uniquely outperforms the supervised baseline in all scenarios, supporting practitioners when the qualities of labeled or unlabeled data are unknown.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Self-Training: A Survey.
  2. ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring. CoRR, abs/1911.09785.
  3. Text Transformations in Contrastive Self-Supervised Learning: A Review.
  4. Universal Sentence Encoder.
  5. Importance of Semantic Representation: Dataless Classification. In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 2, AAAI’08, 830–835. AAAI Press. ISBN 9781577353683.
  6. Semi-Supervised Learning. The MIT Press. ISBN 9780262033589.
  7. Debiased Self-Training for Semi-Supervised Learning.
  8. Beyond Class-Conditional Assumption: A Primary Attempt to Combat Instance-Dependent Label Noise. In AAAI.
  9. A Simple Framework for Contrastive Learning of Visual Representations. In III, H. D.; and Singh, A., eds., Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, 1597–1607. PMLR.
  10. Big Self-Supervised Models are Strong Semi-Supervised Learners.
  11. Csiszar, I. 1975. I𝐼Iitalic_I-Divergence Geometry of Probability Distributions and Minimization Problems. The Annals of Probability, 3(1): 146 – 158.
  12. Ensemble-style Self-training on Citation Classification. In Proceedings of 5th International Joint Conference on Natural Language Processing, 623–631. Chiang Mai, Thailand: Asian Federation of Natural Language Processing.
  13. Label propagation algorithm based on Roll-back detection and credibility assessment. Mathematical Biosciences and Engineering, 17(3): 2432–2450.
  14. Dwork, C. 2011. The Promise of Differential Privacy: A Tutorial on Algorithmic Techniques. In 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science, 1–2.
  15. BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages. In chair), N. C. C.; Choukri, K.; Cieri, C.; Declerck, T.; Goggi, S.; Hasida, K.; Isahara, H.; Maegaard, B.; Mariani, J.; Mazo, H.; Moreno, A.; Odijk, J.; Piperidis, S.; and Tokunaga, T., eds., Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA). ISBN 979-10-95546-00-9.
  16. Gaussian Error Linear Units (GELUs).
  17. Label propagation for deep semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5070–5079.
  18. Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding. In Cortes, C.; Lawrence, N. D.; Lee, D. D.; Sugiyama, M.; and Garnett, R., eds., Advances in Neural Information Processing Systems 28, 919–927. Curran Associates, Inc.
  19. Semi-Supervised Learning via Compact Latent Space Clustering. In Dy, J.; and Krause, A., eds., Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, 2459–2468. PMLR.
  20. Supervised Contrastive Learning.
  21. Kim, Y. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, 1746–1751.
  22. Adam: A Method for Stochastic Optimization. CoRR, abs/1412.6980.
  23. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto.
  24. Lee, D.-H.; et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, volume 3, 896.
  25. DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal, 6.
  26. CoMatch: Semi-supervised Learning with Contrastive Graph Regularization. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 9455–9464.
  27. Realistic Evaluation of Deep Semi-Supervised Learning Algorithms. In Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K.; Cesa-Bianchi, N.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.
  28. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), 115–124. Ann Arbor, Michigan: Association for Computational Linguistics.
  29. Better Self-training for Image Classification through Self-supervision.
  30. OpenMatch: Open-Set Semi-supervised Learning with Open-set Consistency Regularization. In Ranzato, M.; Beygelzimer, A.; Dauphin, Y.; Liang, P.; and Vaughan, J. W., eds., Advances in Neural Information Processing Systems, volume 34, 25956–25967. Curran Associates, Inc.
  31. Transductive semi-supervised deep learning using min-max features. In Proceedings of the European Conference on Computer Vision (ECCV), 299–315.
  32. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neural Information Processing Systems, volume 33, 596–608. Curran Associates, Inc.
  33. R2-D2: Repetitive Reprediction Deep Decipher for Semi-Supervised Deep Learning. arXiv preprint arXiv:2202.08955.
  34. Towards Realistic Long-Tailed Semi-Supervised Learning: Consistency Is All You Need. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3469–3478.
  35. Unsupervised Data Augmentation for Consistency Training. arXiv:1904.12848.
  36. Self-Training With Noisy Student Improves ImageNet Classification. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10684–10695.
  37. A Survey on Deep Semi-supervised Learning. CoRR, abs/2103.00550.
  38. Wide Residual Networks. CoRR, abs/1605.07146.
  39. Character-level Convolutional Networks for Text Classification. CoRR, abs/1509.01626.
  40. Semi-supervised Contrastive Learning with Similarity Co-calibration. ArXiv, abs/2105.07387.
  41. SimMatch: Semi-supervised Learning with Similarity Matching. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14451–14461. Los Alamitos, CA, USA: IEEE Computer Society.
  42. Learning from Labeled and Unlabeled Data with Label Propagation.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Brody Kutt (1 paper)
  2. Pralay Ramteke (1 paper)
  3. Xavier Mignot (1 paper)
  4. Pamela Toman (1 paper)
  5. Nandini Ramanan (7 papers)
  6. Sujit Rokka Chhetri (8 papers)
  7. Shan Huang (69 papers)
  8. Min Du (46 papers)
  9. William Hewlett (2 papers)