- The paper demonstrates that shallower networks, such as VGG-16, can achieve competitive performance in chest radiograph classification while reducing computational demands.
- The study used the CheXpert dataset and evaluated fifteen deep learning models with AUROC and AUPRC metrics to robustly compare performance.
- The findings suggest that incorporating a 'human in the loop' for label refinement can enhance diagnostic accuracy in medical imaging.
Comparative Analysis of Deep Learning Architectures for Chest Radiograph Classification
Introduction
The study titled "Comparing Different Deep Learning Architectures for Classification of Chest Radiographs" (2002.08991) focuses on evaluating various deep learning models—specifically artificial neural networks (ANNs)—to determine their efficacy in processing and categorizing grayscale chest radiographs. Unlike datasets such as ImageNet, which include color images and multiple image classifications, chest radiographs are grayscale and consist of fewer classes. This research posits that shallower models may effectively manage the classification of these images with reduced computational demand and comparable accuracy to deeper networks.
Methods
Data Preparation
The research employed the CheXpert dataset, which comprises 224,316 chest radiographs annotated for 14 findings. Training focused on five key labels: cardiomegaly, edema, consolidation, atelectasis, and pleural effusion, using a subset of images. Images marked with uncertainty labels were excluded to streamline the training process.
Model Training
Fifteen distinct neural network architectures, encompassing ResNet, DenseNet, VGG, SqueezeNet, and AlexNet, were analyzed using the PyTorch and FastAI libraries on a workstation equipped with dual Nvidia GeForce RTX 2080ti GPUs. Training iterations varied in batch sizes (16 and 32), with progressive adjustments to learning rates to optimize the models over multiple epochs.
Evaluation Metrics
Performance evaluation involved calculating AUROC and AUPRC values, providing insights independent of classification thresholds. This facilitated a robust comparison among different model architectures.
Results
The investigation unveiled that deeper neural networks generally achieved superior AUROC values, with ResNet-152 and DenseNet-161 excelling. However, shallower networks like VGG-16 and AlexNet demonstrated compelling results regarding AUPRC metrics, countering the assumption that deeper models universally outperform the shallower counterparts.
Discussion
This analysis underscores the capability of shallow networks, such as VGG-16 and ResNet-34, to classify chest radiographs effectively, challenging the need for computationally intensive deep models. Smaller networks present advantages, including reduced training time and hardware requirements, promoting the viability of increased image resolutions crucial for detecting intricate features in radiographs. Implementing a 'human in the loop' approach could refine label precision, thus optimizing training data quality and model performance, especially when labels are generated through NLP tools with inherent inaccuracies.
These findings align with Raghu et al., highlighting that smaller networks provide comparable performance based on the CheXpert dataset, rivaling deep networks like the DenseNet-121 typically favored in prior studies.
Conclusion
This research demonstrates that smaller ANNs can rival or surpass deeper architectures in chest radiograph classification, providing a pathway for efficient model deployment that demands fewer resources without sacrificing accuracy. This shift could facilitate broader applications in medical imaging, enhancing diagnostic capabilities through more accessible deep learning tools.