CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison (1901.07031v1)

Published 21 Jan 2019 in cs.CV, cs.AI, cs.LG, and eess.IV

Abstract: Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. The dataset is freely available at https://stanfordmlgroup.github.io/competitions/chexpert .

Authors (20)

Jeremy Irvin (18 papers)
Pranav Rajpurkar (69 papers)
Michael Ko (1 paper)
Yifan Yu (18 papers)
Silviana Ciurea-Ilcus (1 paper)
Chris Chute (1 paper)
Henrik Marklund (9 papers)
Behzad Haghgoo (3 papers)
Robyn Ball (1 paper)
Katie Shpanskaya (4 papers)
Jayne Seekins (1 paper)
David A. Mong (1 paper)
Safwan S. Halabi (1 paper)
Jesse K. Sandberg (1 paper)
Ricky Jones (1 paper)
David B. Larson (3 papers)
Curtis P. Langlotz (23 papers)
Bhavik N. Patel (4 papers)
Matthew P. Lungren (43 papers)
Andrew Y. Ng (55 papers)

Citations (2,342)

View on Semantic Scholar

Summary

The paper introduces a comprehensive chest radiograph dataset with 224,316 images and uncertainty labels for 14 observations.
It evaluates five uncertainty-handling methods, showing improved AUC-ROC scores on key pathologies when benchmarked against expert radiologists.
Empirical results demonstrate that AI models using uncertainty annotations can outperform experts on conditions like Cardiomegaly, Edema, and Pleural Effusion.

An Expert Overview of "CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison"

The paper "CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison" introduces a comprehensive and large-scale annotated dataset designed explicitly for chest radiograph interpretation, addressing the critical need for robust, labeled biomedical datasets in machine learning research. Authored by a team from Stanford University, the paper presents not only the dataset but also explores various methodologies for incorporating uncertainty in label annotations into deep learning models.

Dataset Composition and Labeling

The CheXpert dataset consists of 224,316 chest radiographs from 65,240 patients, each annotated for the presence of 14 observations. The observations range from common findings like "No Finding" to specific pathologies such as "Cardiomegaly," "Edema," and "Pleural Effusion." The authors highlight the use of an automated rule-based labeler, developed to extract relevant observations from free-text radiology reports, classifying each observation into positive, negative, or uncertain categories. They report label prevalences, illustrating the distribution of each observation's status in the dataset comprehensively.

Uncertainty Handling Approaches

A significant contribution of the paper is the investigation of different approaches to utilize uncertainty labels in training convolutional neural networks (CNNs) for radiograph interpretation. The authors compare five distinct techniques: ignoring the uncertainty label (U-Ignore), mapping uncertainties to negatives (U-Zeros) or positives (U-Ones), a self-training method (U-SelfTrained), and treating uncertainty as a separate class (U-MultiClass). The performance of these approaches is validated on a set marked by a consensus of three radiologists, with a particular focus on five key pathologies.

Validation and Empirical Findings

The empirical results presented are robust, detailing the Area Under the Receiver Operating Characteristic (AUC-ROC) scores for different uncertainty approaches on the validation set. The U-Ones approach excelled on Atelectasis, suggesting uncertain labels often correspond to actual positive cases, whereas the U-MultiClass approach showed superior performance for Cardiomegaly, likely due to the nuanced representation of borderline cases.

On evaluation against a test set annotated by consensus among five separate board-certified radiologists, the authors demonstrate that the model outperforms three radiologists on critical pathologies like Cardiomegaly, Edema, and Pleural Effusion. These findings underline the potential efficacy of AI models when nuanced uncertainty handling is incorporated, especially in challenging diagnostic scenarios.

Comparative Benchmarking

The CheXpert dataset advances the state of existing chest radiograph datasets like ChestX-ray14 from the National Institutes of Health (NIH) and others by providing radiologist-annotated validation and test sets. This direct comparison to radiologist performance offers a significant improvement over datasets where labels are extracted automatically without human validation, thus potentially increasing the reliability of AI models trained on CheXpert.

Practical and Theoretical Implications

Practically, the CheXpert dataset and model contributions can crucially augment clinical decision support systems, aiding in better workflow prioritization and large-scale screening, particularly in resource-limited settings. Theoretically, the exploration of uncertainty handling in label annotations opens promising avenues for future research on improving the reliability and transparency of AI models in clinical tasks.

Future Directions

Future work could delve into enhancing model architectures or integrating multi-modal data, such as combining radiographic data with patient history or genomic information for more precise diagnostics. Additionally, exploring further semi-supervised learning techniques and refining interpretability mechanisms like Grad-CAMs can provide deeper insights into AI decision-making processes, boosting clinician trust in AI systems.

Conclusion

The "CheXpert" paper significantly contributes to the field of medical imaging and AI by providing a well-annotated, large-scale dataset along with a thorough exploration of uncertainty handling in model training. The results not only showcase the potential of AI systems to match expert radiologist performance but also set a new standard in dataset benchmarks for chest radiography interpretation.

encenters/chexpert].

PDF Markdown

Related Papers

GitHub

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

Tweets

https://twitter.com/shreydan/status/1786109000695873828