CoMatch: Semi-supervised Learning with Contrastive Graph Regularization (2011.11183v2)

Published 23 Nov 2020 in cs.LG and cs.CV

Abstract: Semi-supervised learning has been an effective paradigm for leveraging unlabeled data to reduce the reliance on labeled data. We propose CoMatch, a new semi-supervised learning method that unifies dominant approaches and addresses their limitations. CoMatch jointly learns two representations of the training data, their class probabilities and low-dimensional embeddings. The two representations interact with each other to jointly evolve. The embeddings impose a smoothness constraint on the class probabilities to improve the pseudo-labels, whereas the pseudo-labels regularize the structure of the embeddings through graph-based contrastive learning. CoMatch achieves state-of-the-art performance on multiple datasets. It achieves substantial accuracy improvements on the label-scarce CIFAR-10 and STL-10. On ImageNet with 1% labels, CoMatch achieves a top-1 accuracy of 66.0%, outperforming FixMatch by 12.6%. Furthermore, CoMatch achieves better representation learning performance on downstream tasks, outperforming both supervised learning and self-supervised learning. Code and pre-trained models are available at https://github.com/salesforce/CoMatch.

PDF Abstract

An Analysis of CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

The paper entitled "CoMatch: Semi-supervised Learning with Contrastive Graph Regularization" presents an innovative approach within the field of semi-supervised learning (SSL), an area crucial for reducing the dependence on labeled data within computer vision tasks. The authors address the limitations of prevalent SSL techniques and propose a new framework, CoMatch, which leverages contrastive graph regularization.

Core Contributions

CoMatch's primary contribution lies in its ability to simultaneously learn class probabilities and low-dimensional embeddings, allowing these representations to interact and evolve in a synergistic manner. The paper outlines three critical components of CoMatch:

Co-training via Dual Representations: By co-training class probabilities and embeddings, CoMatch overcomes the inherent limitations of current methods which either rely too heavily on self-training pseudo-labeling or are too task-agnostic, as seen in self-supervised learning.
Memory-smoothed Pseudo-labeling: This mechanism refines pseudo-labels by utilizing embeddings to smooth out class probabilities, thus improving the accuracy of the pseudo-labels which are crucial for training unlabeled data.
Graph-based Contrastive Learning: CoMatch integrates contrastive learning with graph-based smoothness constraints, allowing pseudo-labels and embeddings to jointly evolve so that samples with similar pseudo-labels are encouraged to have similar embeddings, enhancing the performance of the learned representations.

Numerical Performance

The experiments demonstrate substantial improvements in performance across various datasets, notably highlighting the robustness of CoMatch in label-scarce situations. Particularly on CIFAR-10 with minimal labeled data (4 samples per class), CoMatch exceeds the benchmark set by FixMatch by 6.11%. On more extensive datasets like ImageNet with only 1% of labels, CoMatch achieves a top-1 accuracy of 66.0%, outperforming FixMatch by 12.6%.

Theoretical and Practical Implications

CoMatch is positioned within a conceptual framework that integrates memory banks and momentum queues, ensuring the scalability of the model even on large datasets. This innovation resolves common scalability issues in SSL by distributing computational requirements effectively, marking significant progress in efficiently handling large-scale datasets.

In practical terms, CoMatch offers superior and reliable semi-supervised representation learning, closing the gap between supervised and semi-supervised learning performance. This has promising implications for real-world applications where labeled data is limited or expensive to obtain.

Future Developments

CoMatch represents a notable advancement in semi-supervised learning, opening pathways for further architectural innovations that combine graph theory and deep learning techniques. Future research could explore enhancing the adaptability of CoMatch to diverse data modalities and deciphering its potential implications in dynamic environments.

Enhancements may include leveraging more intricate graph structures or exploring alternative contrastive loss formulations to further refine the embeddings' quality. There are also opportunities to extend CoMatch's underlying principles to unsupervised domain adaptation or fully unsupervised learning.

In conclusion, CoMatch presents a well-founded progression in semi-supervised learning methodologies, providing both substantial performance improvements and offering a versatile framework adaptable to various learning tasks. Its contribution to concurrently refining pseudo-labels and embeddings sets a new standard for future SSL research and applications.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Junnan Li (56 papers)
Caiming Xiong (337 papers)
Steven Hoi (38 papers)

Citations (231)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - salesforce/CoMatch: Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization (125 stars)

Tweets

https://twitter.com/LiJunnan0409/status/1418372493111304192

YouTube

Show All Videos