ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases (1705.02315v5)

Published 5 May 2017 in cs.CV and cs.CL

Abstract: The chest X-ray is one of the most commonly accessible radiological examinations for screening and diagnosis of many lung diseases. A tremendous number of X-ray imaging studies accompanied by radiological reports are accumulated and stored in many modern hospitals' Picture Archiving and Communication Systems (PACS). On the other side, it is still an open question how this type of hospital-size knowledge database containing invaluable imaging informatics (i.e., loosely labeled) can be used to facilitate the data-hungry deep learning paradigms in building truly large-scale high precision computer-aided diagnosis (CAD) systems. In this paper, we present a new chest X-ray database, namely "ChestX-ray8", which comprises 108,948 frontal-view X-ray images of 32,717 unique patients with the text-mined eight disease image labels (where each image can have multi-labels), from the associated radiological reports using natural language processing. Importantly, we demonstrate that these commonly occurring thoracic diseases can be detected and even spatially-located via a unified weakly-supervised multi-label image classification and disease localization framework, which is validated using our proposed dataset. Although the initial quantitative results are promising as reported, deep convolutional neural network based "reading chest X-rays" (i.e., recognizing and locating the common disease patterns trained with only image-level labels) remains a strenuous task for fully-automated high precision CAD systems. Data download link: https://nihcc.app.box.com/v/ChestXray-NIHCC

PDF Abstract

ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases

The paper "ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases," presents a substantial contribution to the domain of medical imaging and computer-aided diagnosis (CAD) systems. ChestX-ray8 is a landmark dataset compiled from 108,948 frontal-view X-ray images of 32,717 unique patients. This dataset is notable not only for its scale but also for the inclusion of 8 common thoracic diseases, which are annotated using NLP techniques applied to radiological reports.

Objectives and Contributions

The paper addresses the significant gap between the availability of large-scale annotated image datasets and the demands of deep learning models for training high-precision CAD systems. The primary contributions of this research include:

Dataset Construction: ChestX-ray8 represents one of the most comprehensive collections of chest X-ray images, annotated with 8 common thoracic diseases.
Weakly-Supervised Framework: The authors propose and validate a weakly-supervised learning framework for both multi-label image classification and disease localization.
NLP Techniques for Labeling: A robust pipeline employing NLP to mine disease labels from accompanying radiological reports.
Benchmarking: Extensive benchmarking of deep learning models against this dataset, showcasing the challenges and performance capabilities of the proposed framework.

Methodology

Dataset Construction

ChestX-ray8 was constructed using a pipeline that identifies relevant radiological reports and associated images by leveraging NLP. Radiological reports were processed to extract disease mentions while accounting for negations and uncertainties. This meticulous labeling process ensured the reliability of the annotations, crucial for training deep learning models.

Weakly-Supervised Learning Framework

The core of the proposed methodology is a weakly-supervised learning framework designed to handle multi-label classification and disease localization:

Multi-Label Classification: The framework utilizes deep convolutional neural networks (DCNNs) such as AlexNet, GoogLeNet, VGGNet-16, and ResNet-50, repurposed for the multi-label classification tasks.
Global Pooling and Prediction Layers: These layers facilitate the generation of heatmaps that encapsulate spatial information about disease localization.
Loss Functions: Various loss functions, including Cross Entropy Loss and weighted variants, were explored to counter the imbalance between positive and negative samples.

Numerical Results

The paper reports the Area Under the Curve (AUC) for Receiver Operating Characteristic (ROC) classifications from several model architectures. Notably, ResNet-50 achieved the highest performance across multiple disease categories, for instance, "Cardiomegaly" (AUC=0.8141) and "Pneumothorax" (AUC=0.7891). The positive/negative balancing strategies for loss functions formed a critical aspect, enhancing model performance especially for under-represented categories.

Localization Performance

The localization framework employs heatmaps to pinpoint regions of interest within X-ray images. Success was measured using Intersection over the detected Bounding Box area ratio (IoBB). The paper's results reveal a substantial promise for weakly-supervised localization, although more sophisticated bounding box generation methods could further refine these results.

Implications and Future Directions

Practically, the dataset and methods proposed pave the way for more robust, automated CAD tools that can alleviate radiologists' workloads by providing actionable insights from X-ray images. Theoretically, it showcases the potential and current limitations of employing deep learning for medical image analysis, emphasizing the importance of high-quality, large-scale datasets.

The research opens multiple avenues for future work:

Expansion of Disease Classes: Extending ChestX-ray8 to include more disease labels could offer a broader diagnostic tool.
Integration with Patient Histories: Combining imaging data with longitudinal patient history could enhance diagnostic accuracy.
Improvement in Localization Techniques: Leveraging advanced methods like selective search or region proposal networks to enhance the localization performance.

Overall, the ChestX-ray8 dataset and the associated methodological framework mark a significant step towards the realization of high-precision CAD systems, making an indelible impact on the burgeoning field of medical imaging.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Xiaosong Wang (42 papers)
Yifan Peng (147 papers)
Le Lu (148 papers)
Zhiyong Lu (113 papers)
Mohammadhadi Bagheri (9 papers)
Ronald M. Summers (111 papers)

Citations (2,263)

View on Semantic Scholar

ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases (1705.02315v5)