Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (2105.14148v2)

Published 28 May 2021 in cs.CV

Abstract: Semi-supervised learning (SSL) is an effective means to leverage unlabeled data to improve a model's performance. Typical SSL methods like FixMatch assume that labeled and unlabeled data share the same label space. However, in practice, unlabeled data can contain categories unseen in the labeled set, i.e., outliers, which can significantly harm the performance of SSL algorithms. To address this problem, we propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch. Learning representations of inliers while rejecting outliers is essential for the success of OSSL. To this end, OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers. The OVA-classifier outputs the confidence score of a sample being an inlier, providing a threshold to detect outliers. Another key contribution is an open-set soft-consistency regularization loss, which enhances the smoothness of the OVA-classifier with respect to input transformations and greatly improves outlier detection. OpenMatch achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Kuniaki Saito (31 papers)
  2. Donghyun Kim (129 papers)
  3. Kate Saenko (178 papers)
Citations (60)

Summary

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers

The paper by Saito et al. addresses a notable challenge in semi-supervised learning (SSL): the presence of outlier categories in unlabeled data not found in the labeled data, referred to as open-set semi-supervised learning (OSSL). Traditional SSL algorithms typically assume that labeled and unlabeled data share an identical label space, an assumption which, when violated, can significantly deteriorate the model performance. This work introduces a novel approach, OpenMatch, to concurrently address the dual task of classifying inliers and detecting outliers within an OSSL context, drawing upon the strengths of FixMatch and novelty detection mechanisms.

The cornerstone of OpenMatch lies in its integration of one-vs-all (OVA) classifiers with a tailored soft-consistency regularization loss. The OVA classifiers are pivotal in generating a confidence score for potential inliers, thereby offering a threshold mechanism to delineate outliers. This innovation enables OpenMatch to proficiently distinguish between known and novel categories without prior labels for the outliers. Moreover, the open-set soft-consistency regularization loss enhances the OVA-classifier's robustness to various transformations applied to the input data, yielding pronounced improvements in outlier detection.

The implementation of OpenMatch demonstrates impressive empirical results across multiple datasets, most notably achieving a 10.4% error rate with 300 labeled examples on CIFAR-10—an advance over the previous best record of 20.3%. A particularly striking achievement is its ability to outperform fully supervised models in detecting outlier categories that are entirely absent in the unlabeled dataset. For example, during CIFAR-10 experiments with 100 samples per class, OpenMatch achieved a 3.4% AUROC improvement over models exposed to complete labeled datasets.

This framework thus contributes significant improvements in both recognizing known classes and detecting foreign objects in unlabeled data. The inclusion of a novel soft consistency mechanism fortifies model accuracy by preventing the assignment of erroneous class labels to outlier data, ultimately fostering higher quality SSL models that can adapt to real-world data irregularities.

The introduction of open-set thinking into an SSL framework paves a pathway for future endeavors in developing models that inherently accommodate anomalies in data, a frequent occurrence in naturalistic settings. OpenMatch proposes a direction wherein the combined power of soft regularization strategies and sophisticated classifier architectures can collaboratively inform robust, adaptive machine learning systems.

Future developments could potentially explore the intersection of self-supervised learning techniques with OSSL, further enriching model resilience by leveraging latent structures within the data for anomaly differentiation. Given the framework's promising results and its scalability across varied dataset configurations, OpenMatch offers a compelling strategy to advance the frontier of adaptable semi-supervised learning.

Youtube Logo Streamline Icon: https://streamlinehq.com