Deep learning with noisy labels: exploring techniques and remedies in medical image analysis (1912.02911v4)

Published 5 Dec 2019 in cs.CV, cs.LG, eess.IV, and stat.ML

Abstract: Supervised training of deep learning models requires large labeled datasets. There is a growing interest in obtaining such datasets for medical image analysis applications. However, the impact of label noise has not received sufficient attention. Recent studies have shown that label noise can significantly impact the performance of deep learning models in many machine learning and computer vision applications. This is especially concerning for medical applications, where datasets are typically small, labeling requires domain expertise and suffers from high inter- and intra-observer variability, and erroneous predictions may influence decisions that directly impact human health. In this paper, we first review the state-of-the-art in handling label noise in deep learning. Then, we review studies that have dealt with label noise in deep learning for medical image analysis. Our review shows that recent progress on handling label noise in deep learning has gone largely unnoticed by the medical image analysis community. To help achieve a better understanding of the extent of the problem and its potential remedies, we conducted experiments with three medical imaging datasets with different types of label noise, where we investigated several existing strategies and developed new methods to combat the negative effect of label noise. Based on the results of these experiments and our review of the literature, we have made recommendations on methods that can be used to alleviate the effects of different types of label noise on deep models trained for medical image analysis. We hope that this article helps the medical image analysis researchers and developers in choosing and devising new techniques that effectively handle label noise in deep learning.

PDF Abstract

Deep Learning with Noisy Labels in Medical Image Analysis

The investigation into deep learning (DL) models trained on noisy labels is pivotal for advancing medical image analysis. The complexity of DL requires not just vast data quantities but also high-quality labels, often plagued by noise in medical contexts. This paper provides a thorough examination of techniques and remedies to mitigate label noise impact, particularly critical due to expert dependency and high inter- and intra-observer variability.

Overview of Label Noise Challenges

Label noise significantly degrades DL model performance, complicating their deployment in medical applications where errors can affect healthcare decisions. The paper identifies several key challenges:

Small Dataset Sizes: Medical datasets are typically smaller and less varied than those in broader computer vision applications, exacerbating the effects of label noise.
Expert-Dependent Labeling: Reliance on domain experts for annotated data introduces variability, as expert opinions and judgments differ.
Necessity for Accurate Predictions: Erroneous predictions can have dire consequences given their application in health-related contexts.

State-of-the-Art Techniques and Medical Context

The authors systematically review existing strategies to address label noise, suggesting many have been overlooked in medical applications. They categorize these strategies into several classes, each offering potential solutions:

Label Cleaning and Pre-Processing: Identifying and correcting mislabeled data prior to model training.
Network Architecture Adjustments: Incorporating noise layers or other architectural modifications to better account for label noise.
Loss Function Modifications: Employing robust loss functions, such as the mean absolute error (MAE), to mitigate the influence of noisy labels.
Data Re-Weighting: Adjusting the significance of data samples thought to have noisy labels during training.
Utilizing Data and Label Consistency: Exploiting the coherence among data features to detect incorrectly labeled samples.
Training Procedures: Implementing novel strategies such as curriculum learning or knowledge distillation to progressive training on increasingly challenging samples.

Experimental Insights and Recommendations

The authors conduct experiments using three medical image datasets, each representative of a different noise type:

Brain Lesion Detection and Segmentation: Leveraging iterative label cleaning and re-weighting demonstrated improved detection and segmentation, suggesting these are effective for handling systematic annotation biases.
Prostate Cancer Pathology Classification: Modeling annotator confusion and utilizing the minimum-loss label yielded significant accuracy improvements, emphasizing the value of understanding inter-observer variability.
Fetal Brain Segmentation: Dual CNNs with iterative label updates proved beneficial in settings with autogenerated noisy labels, highlighting the potential to refine machine-generated labels progressively.

These results support the development of novel training algorithms tailored to specific noise characteristics and suggest that theoretical insights should be continually integrated into practical applications.

Implications and Future Work

The research illuminates critical pathways for future DL application in medical imaging. By demonstrating specific, context-driven strategies for handling label noise, it establishes a framework for further exploration and adaptation across diverse medical tasks. Future work could aim to:

Develop increasingly robust methods tailored to various medical imaging domains.
Investigate the balance between dataset size and label accuracy in training effectiveness.
Examine the practical implementation and integration of these strategies within clinical workflows.

By addressing these challenges, the findings contribute to more reliable and deployable DL models in medical imaging, enhancing decision-making processes and ultimately improving patient outcomes.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Davood Karimi (35 papers)
Haoran Dou (34 papers)
Simon K. Warfield (27 papers)
Ali Gholipour (34 papers)

Citations (489)

View on Semantic Scholar