Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks (1706.00038v2)

Published 31 May 2017 in cs.LG and stat.ML

Abstract: Collecting large training datasets, annotated with high-quality labels, is costly and time-consuming. This paper proposes a novel framework for training deep convolutional neural networks from noisy labeled datasets that can be obtained cheaply. The problem is formulated using an undirected graphical model that represents the relationship between noisy and clean labels, trained in a semi-supervised setting. In our formulation, the inference over latent clean labels is tractable and is regularized during training using auxiliary sources of information. The proposed model is applied to the image labeling problem and is shown to be effective in labeling unseen images as well as reducing label noise in training on CIFAR-10 and MS COCO datasets.

Authors (1)

Arash Vahdat (69 papers)

Citations (291)

View on Semantic Scholar

Summary

The paper introduces a CRF-based approach that effectively infers clean labels from noisy datasets during CNN training.
It employs a semi-supervised framework with auxiliary distributions to regularize inference and capture label interactions.
Experimental results on CIFAR-10 and MS COCO show significant gains in mean average precision compared to traditional methods.

Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks

The paper "Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks" explores a critical challenge in the training of deep learning models, specifically convolutional neural networks (CNNs), using datasets with noisy labels. This situation is common in real-world applications where high-quality labeled data can be expensive and difficult to obtain, necessitating compromises that result in noisy data labels. This paper proposes a novel framework leveraging deep CNNs to robustly handle such noisy labeled datasets.

Methodology

The authors introduce a conditional random field (CRF)-based approach that models the relationship between noisy and clean labels using an undirected graphical model. Their framework tackles the multilabel classification problem, where inference over latent variables, interpreted as clean labels, becomes tractable. The model is trained in a semi-supervised fashion, utilizing both noisy and clean data points, wherein a smaller subset of the data is cleanly labeled to guide the learning process.

The cornerstone of this approach is a specific CRF formulation that efficiently infers latent clean labels by employing auxiliary sources of information. This model proposes a unique objective function that regularizes inference using these auxiliary distributions, significantly enhancing the model's resilience to label noise. Unlike previous models, which often ignore the relationships between different clean and noisy labels, this design captures these interactions through a robust energy function.

Experimental Results

The proposed framework is evaluated on the CIFAR-10 and MS COCO datasets. These experiments demonstrate the prowess of the CRF model in improving label accuracy, even when trained on predominantly noisy labels. On the MS COCO dataset, the CRF model outperformed baseline methods in both caption labels and Flickr tags settings, increasing mean average precision (mAP) significantly over models trained directly with cross-entropy loss using noisy labels.

In the CIFAR-10 evaluations, the approach displayed robustness across varying levels of synthetic noise injected into the training data. The model exceeded or matched the performance of established methods like the forward-correction loss and backward-correction techniques, particularly in accurately recovering clean labels from the noisy training set.

Implications and Future Directions

This research offers a meaningful contribution by presenting a framework that enhances the robustness of CNNs trained with noisy labels, which is particularly beneficial in domains where cleanly labeled datasets are scarce. The proposed approach’s reliance on auxiliary distributions for regularizing label inference demonstrates a promising avenue for exploiting domain knowledge or pre-existing semantic relationships to rectify noise in data labeling.

Looking forward, the integration of such a robust methodology into active learning systems or data preprocessing pipelines could further maximize the return on investment in existing noisy datasets. Future research might also explore expanding the auxiliary distribution component to involve more intricate relationships or dependencies, potentially incorporating unsupervised learning techniques. This could further diminish the reliance on clean data subsets, making algorithms even more applicable to large-scale industrial data scenarios where noise is prevalent.

Overall, the methodologies proposed in this paper represent a substantial step forward in the quest for robust neural networks capable of operating effectively with imperfect training data.

PDF Markdown