How does Disagreement Help Generalization against Label Corruption? (1901.04215v3)

Published 14 Jan 2019 in cs.LG and stat.ML

Abstract: Learning with noisy labels is one of the hottest problems in weakly-supervised learning. Based on memorization effects of deep neural networks, training on small-loss instances becomes very promising for handling noisy labels. This fosters the state-of-the-art approach "Co-teaching" that cross-trains two deep neural networks using the small-loss trick. However, with the increase of epochs, two networks converge to a consensus and Co-teaching reduces to the self-training MentorNet. To tackle this issue, we propose a robust learning paradigm called Co-teaching+, which bridges the "Update by Disagreement" strategy with the original Co-teaching. First, two networks feed forward and predict all data, but keep prediction disagreement data only. Then, among such disagreement data, each network selects its small-loss data, but back propagates the small-loss data from its peer network and updates its own parameters. Empirical results on benchmark datasets demonstrate that Co-teaching+ is much superior to many state-of-the-art methods in the robustness of trained models.

Citations (714)

View on Semantic Scholar

Summary

The paper introduces the Co-teaching+ algorithm that uses an 'Update by Disagreement' strategy to maintain network divergence for improved generalization in noisy label settings.
It leverages cross-updating on small-loss samples to effectively mitigate the impact of label corruption, yielding higher test accuracies on benchmarks like MNIST and CIFAR.
Experimental evaluations confirm that Co-teaching+ consistently outperforms traditional methods, demonstrating its practical robustness across various noise levels and datasets.

Analysis of "How does Disagreement Help Generalization against Label Corruption?"

The paper, "How does Disagreement Help Generalization against Label Corruption?" by Xingrui Yu et al., presents a novel approach termed Co-teaching+ for robust learning under noisy labels in weakly-supervised learning scenarios. This paper introduces a compelling method to address the persistent challenge of training deep neural networks on datasets with label noise, which can significantly degrade the performance of classifiers.

Background

The context of this research lies in the problem of learning with noisy labels, which is a major concern in weakly-supervised learning. Noisy labels often arise in practical applications such as web queries, crowdsourcing, medical image classification, and financial data analysis, where collected data is not always reliably labeled. Classical approaches to handle noisy labels typically involve adding regularization techniques or estimating the label transition matrix, but these methods have inherent limitations. Deep neural networks are particularly vulnerable to noisy labels due to their capacity to memorize noisy instances, which can lead to poor generalization.

Co-teaching+ Algorithm

The Co-teaching+ paradigm expands upon the "Co-teaching" method, which trains two networks simultaneously and cross-updates them using a small-loss criterion. The primary motivation behind the Co-teaching+ algorithm is to prevent the networks from converging on the same solutions. The critical insight introduced in this paper is the "Update by Disagreement" strategy which maintains a divergence between the networks throughout the training process. This strategy involves the following steps:

Prediction: Both networks predict the labels for all instances in a mini-batch.
Disagreement Update: Only instances for which the networks disagree are retained for further processing.
Small-Loss Selection: Each network selects small-loss instances from the disagreement set.
Cross-Update: Each network updates its parameters using the small-loss instances selected by its peer network.

The gradual evolution of the network parameters by addressing instances where the networks disagree ensures that they remain divergent, thereby enhancing generalization.

Experimental Evaluation

The paper meticulously evaluates Co-teaching+ against several state-of-the-art methods on numerous benchmark datasets, including MNIST, CIFAR-10, CIFAR-100, a text dataset NEWS, and a more complex Tiny-ImageNet dataset. Across these datasets, Co-teaching+ consistently demonstrated superior robustness against label noise. Key results include:

On MNIST with 45% pair flipping noise, Co-teaching+ markedly outperformed methods like MentorNet and Decoupling.
On CIFAR-10 and CIFAR-100, Co-teaching+ maintained higher test accuracy levels even with 50% symmetric noise and 45% pair flipping noise.
On the NEWS text dataset, Co-teaching+ showed significant gains, underscoring its versatility beyond vision tasks.

Additionally, Co-teaching+ was evaluated on larger and more challenging T-ImageNet and open-set scenarios, reinforcing its practical applicability. For instance, in T-ImageNet with 50% symmetric noise, Co-teaching+ achieved a 41.77% maximum test accuracy, far exceeding other methods.

Key Contributions

The paper makes several pivotal contributions to the discourse on learning with noisy labels:

It empirically demonstrates that the "Update by Disagreement" strategy keeps two networks divergent, a crucial factor that significantly enhances the robustness of Co-teaching.
It identifies three key factors for effective learning with noisy labels: leveraging the small-loss trick, cross-updating parameters of two networks, and maintaining divergence between networks.
It broadens the applicability of the disagreement-based learning paradigm, aligning with previous theoretical insights on classifier ensembles.

Implications and Future Directions

The practical and theoretical implications of Co-teaching+ are substantial. Practically, the approach is poised to enhance the robustness of classifiers in real-world scenarios where noisy labels are prevalent. Theoretically, it opens avenues for deeper exploration of disagreement-based learning strategies and their optimal integration with current deep learning techniques. Future work may delve into optimizing the disagreement criterion or exploring its synergy with other state-of-the-art noise-robust algorithms. Additionally, theoretical underpinnings of the observed divergence in disagreement strategies warrant further investigation.

In conclusion, the Co-teaching+ paradigm introduced by Yu et al. makes a significant contribution to the robust training of deep networks in the presence of label noise, offering a compelling solution to a ubiquitous challenge in machine learning.

PDF Markdown