Combating noisy labels by agreement: A joint training method with co-regularization (2003.02752v3)

Published 5 Mar 2020 in cs.CV, cs.LG, and stat.ML

Abstract: Deep Learning with noisy labels is a practically challenging problem in weakly supervised learning. The state-of-the-art approaches "Decoupling" and "Co-teaching+" claim that the "disagreement" strategy is crucial for alleviating the problem of learning with noisy labels. In this paper, we start from a different perspective and propose a robust learning paradigm called JoCoR, which aims to reduce the diversity of two networks during training. Specifically, we first use two networks to make predictions on the same mini-batch data and calculate a joint loss with Co-Regularization for each training example. Then we select small-loss examples to update the parameters of both two networks simultaneously. Trained by the joint loss, these two networks would be more and more similar due to the effect of Co-Regularization. Extensive experimental results on corrupted data from benchmark datasets including MNIST, CIFAR-10, CIFAR-100 and Clothing1M demonstrate that JoCoR is superior to many state-of-the-art approaches for learning with noisy labels.

Citations (462)

View on Semantic Scholar

Summary

The paper introduces JoCoR—a joint training method with co-regularization that mitigates the impact of noisy labels in deep neural networks.
It employs a dual-network approach combining cross-entropy and contrastive loss to align predictions and reduce discrepancies.
Extensive experiments demonstrate that JoCoR outperforms state-of-the-art methods on benchmarks including MNIST, CIFAR-10, and Clothing1M.

Overview of "Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization"

This paper addresses the challenge of training deep neural networks (DNNs) with noisy labels, a significant issue in weakly supervised learning. The authors propose a novel method called JoCoR (Joint Training with Co-Regularization), which is designed to mitigate the effects of noisy labels in datasets. This method diverges from traditional approaches such as "Decoupling" and "Co-teaching+" by focusing on reducing the diversity between two networks during the training process through co-regularization.

Methodology

The JoCoR approach involves two neural networks trained simultaneously. Each network makes predictions on the same mini-batch, after which a joint loss incorporating co-regularization is calculated for each data point. The networks are updated simultaneously using only the examples with the smallest losses. This joint loss comprises two parts:

Supervised Loss: Utilizes cross-entropy loss to align predictions with the given labels.
Co-Regularization Loss: Employs a contrastive loss to minimize differences between the predictions of the two networks, leveraging the Jensen-Shannon Divergence.

By concentrating on agreement between the networks rather than disagreement, JoCoR offers a robust approach to handling noisy annotations in datasets.

Experimental Results

The effectiveness of JoCoR is demonstrated through extensive experiments on benchmark datasets, including MNIST, CIFAR-10, CIFAR-100, and the real-world noisy dataset Clothing1M. Key numerical results indicate that JoCoR achieves superior performance compared to state-of-the-art methods:

On MNIST and CIFAR-10, JoCoR outperforms other methods, particularly under high noise conditions (e.g., 80% symmetric noise).
On the Clothing1M dataset, JoCoR shows a significant improvement in classification accuracy over other approaches, reflecting its applicability in real-world scenarios.

Implications and Future Directions

The proposed JoCoR method effectively challenges the traditional reliance on disagreement strategies for dealing with noisy labels, suggesting that joint training with co-regularization can yield more reliable results. The method's superior performance across various noise levels and datasets highlights its potential as a general solution for weakly supervised learning tasks.

From a theoretical perspective, the paper raises questions about the underlying principles of co-training and the dynamics of agreement maximization in machine learning. Practically, this approach can be instrumental in domains where high-quality labeled data is hard to obtain, such as medical imaging or remote sensing.

Future research could explore the theoretical underpinnings of JoCoR further, investigate its applicability to other network architectures, or test its effectiveness across diverse domain-specific noisy datasets. Additionally, exploring the integration of JoCoR with other learning paradigms, such as semi-supervised or unsupervised learning, may yield new insights and enhancements in AI robustness against noise.

PDF Markdown