Comparison of Deep Learning Approaches for Multi-Label Chest X-Ray Classification (1803.02315v2)

Published 6 Mar 2018 in cs.CV

Abstract: The increased availability of X-ray image archives (e.g. the ChestX-ray14 dataset from the NIH Clinical Center) has triggered a growing interest in deep learning techniques. To provide better insight into the different approaches, and their applications to chest X-ray classification, we investigate a powerful network architecture in detail: the ResNet-50. Building on prior work in this domain, we consider transfer learning with and without fine-tuning as well as the training of a dedicated X-ray network from scratch. To leverage the high spatial resolution of X-ray data, we also include an extended ResNet-50 architecture, and a network integrating non-image data (patient age, gender and acquisition type) in the classification process. In a concluding experiment, we also investigate multiple ResNet depths (i.e. ResNet-38 and ResNet-101). In a systematic evaluation, using 5-fold re-sampling and a multi-label loss function, we compare the performance of the different approaches for pathology classification by ROC statistics and analyze differences between the classifiers using rank correlation. Overall, we observe a considerable spread in the achieved performance and conclude that the X-ray-specific ResNet-38, integrating non-image data yields the best overall results. Furthermore, class activation maps are used to understand the classification process, and a detailed analysis of the impact of non-image features is provided.

Authors (5)

Ivo M. Baltruschat (8 papers)
Hannes Nickisch (17 papers)
Michael Grass (3 papers)
Tobias Knopp (38 papers)
Axel Saalbach (11 papers)

Citations (368)

View on Semantic Scholar

Summary

Evaluation of Deep Learning Approaches for Multi-Label Chest X-Ray Classification

The paper at hand presents a methodical examination of various deep learning strategies for multi-label classification of chest X-ray images using the expansive ChestX-ray14 dataset. The overarching aim is to derive insights into the efficacy of diverse network architectures, initialization strategies, and auxiliary feature integration in enhancing classification performance. The paper primarily revolves around the utilization of the ResNet-50 network within the context of transfer learning, fine-tuning, and training from scratch approaches.

A focal point of this research is the investigation into the ResNet-50 architecture, alongside its extended variant, the ResNet-50-large, adapted to accommodate the unique spatial demands of the ChestX-ray14 dataset by modifying the input size. Moreover, the integration of non-image data such as patient age, gender, and image acquisition type into the network architecture is explored. This holistic approach aims to simulate the diagnostic process employed in clinical environments where additional patient information is taken into account.

Analysis and Results

The evaluation process employs a robust 5-fold resampling technique coupled with a multi-label loss function to ensure a comprehensive analysis of classification performance. The paper explores three primary areas: weight initialization and transfer learning, network architectures' impact, and the incorporation of non-image features.

Weight Initialization and Transfer Learning: The analysis discerned that transfer-learning models incorporating fine-tuning exhibit significant performance enhancements over models trained from scratch or using off-the-shelf parameters. The ResNet-50 architecture fine-tuned on the ChestX-ray14 dataset delivered an average AUC of 0.819, a notable leap from the baseline performance of off-the-shelf networks (AUC 0.730).
Network Architecture Variations: The extended ResNet-50-large achieved a marginal improvement in average AUC over its standard counterpart, demonstrating the benefit of higher input resolution in effectively distinguishing intricate pathological features like masses or nodules.
Non-Image Features Integration: The incorporation of non-image data delivered a slight increase in average AUC, with the ResNet-50-large-meta variant attaining the apex average AUC of 0.822. This suggests that while the integration of non-image features yields benefits, the image features extracted by the network already encapsulate substantial information, as corroborated by the ability of image-based networks to predict these attributes with notable accuracy independently.

The rank correlation analysis of model outcomes suggests prevalent prediction consistency among models trained solely on X-ray data, unveiling avenues for further research into model robustness and consistency.

Implications and Future Directions

The findings of this research underscore the implications and potential applications of deep learning in automating the interpretation of large-scale medical image datasets, thus addressing the scarcity of expert radiological reviews amidst increasing patient volumes. However, the reliance on datasets with label noise, such as ChestX-ray14, reveals lingering challenges, particularly in reliably distinguishing clinically relevant pathological instances, as highlighted by the Grad-CAM analysis indicating the misidentification of pneumothorax in treated cases.

Practically, the endeavor for integration into clinical workflows necessitates further refinement of network architectures and evaluation methodologies. Future explorations could entail the development of novel architectures that leverage dependency on inter-related labels or incorporate segmentation techniques to enhance spatial feature extraction. As large-scale annotated medical datasets become increasingly available, the continuous advancement in model adaptability and interpretability will catalyze the transition of these findings from the research domain to clinical application.

PDF Markdown

Related Papers

Find Related Papers