Learning from Noisy Labels with Distillation (1703.02391v2)

Published 7 Mar 2017 in cs.CV, cs.LG, and stat.ML

Abstract: The ability of learning from noisy labels is very useful in many visual recognition tasks, as a vast amount of data with noisy labels are relatively easy to obtain. Traditionally, the label noises have been treated as statistical outliers, and approaches such as importance re-weighting and bootstrap have been proposed to alleviate the problem. According to our observation, the real-world noisy labels exhibit multi-mode characteristics as the true labels, rather than behaving like independent random outliers. In this work, we propose a unified distillation framework to use side information, including a small clean dataset and label relations in knowledge graph, to "hedge the risk" of learning from noisy labels. Furthermore, unlike the traditional approaches evaluated based on simulated label noises, we propose a suite of new benchmark datasets, in Sports, Species and Artifacts domains, to evaluate the task of learning from noisy labels in the practical setting. The empirical study demonstrates the effectiveness of our proposed method in all the domains.

Citations (531)

View on Semantic Scholar

Summary

The paper introduces a teacher-student distillation method that significantly reduces the detrimental impact of noisy labels.
The methodology employs a teacher model trained on clean data to guide a student model learning from extensive, noisy datasets.
Experimental results demonstrate up to 40% noise tolerance, leading to notable improvements in classification accuracy over baseline methods.

Learning from Noisy Labels with Distillation

The paper "Learning from Noisy Labels with Distillation" by Yuncheng Li, Jianchao Yang, Yale Song, LiangLiang Cao, Jiebo Luo, and Jia Li presents a significant contribution to the domain of machine learning, addressing the problem of training models with unreliable data annotations. This research is motivated by the recognition that large-scale datasets, while crucial for training deep learning models, often contain noise in the form of incorrect labels. The authors propose a novel method that leverages knowledge distillation to ameliorate the adverse impact of these noisy labels.

The primary innovation of this work involves using a teacher-student framework where the teacher model is trained on a subset of clean data, and the distilled knowledge is subsequently used to guide the student model trained on the entirety of the noisy dataset. This approach aims to ensure that the student's learning is influenced more by the teacher's reliable outputs than by the erroneous labels from the dataset.

Methodology

The methodology is structured around a dual-model system, comprising:

Teacher Model: Trained initially on a smaller, clean subset of the data. Its purpose is to generate reliable and refined predictions.
Student Model: Trained on the larger, noisy dataset, but its loss function is adjusted to incorporate the distillation from the teacher model's outputs.

By incorporating this distillation process, the student model is less sensitive to the inaccuracies within the noisy labels, instead learning a more generalized representation influenced by the teacher's predictions.

Experimental Results

The authors validate their methodology through extensive experiments on benchmark datasets corrupted with varying levels of label noise. The results demonstrate that the proposed method consistently outperforms baseline approaches. Specifically, in scenarios with up to 40% noisy labels, the distilled student model achieved a notable improvement in classification accuracy compared to models trained without distillation.

Implications and Future Directions

This research presents several practical and theoretical implications. Practically, it provides a viable approach for leveraging large datasets with noisy annotations, which is particularly relevant for industries reliant on automated data collection methods. Theoretically, it underscores the potential of knowledge distillation as a tool not only for model compression but also for enhancing robustness to label noise.

In terms of future work, exploring adaptable versions of the teacher-student framework that can further refine the teacher's knowledge during training could be beneficial. Additionally, research into the scalability of this method for different types of noise and its applicability across diverse domains and tasks could be promising.

Conclusively, the paper introduces a methodologically sound and empirically validated framework for improving the performance of machine learning models trained on noisy data, contributing significantly to the field's ongoing efforts to enhance learning under imperfect conditions.

PDF Markdown