Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening (1903.08297v1)

Published 20 Mar 2019 in cs.LG, cs.CV, and stat.ML

Abstract: We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use a very high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and find our model to be as accurate as experienced radiologists when presented with the same data. Finally, we show that a hybrid model, averaging probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To better understand our results, we conduct a thorough analysis of our network's performance on different subpopulations of the screening population, model design, training procedure, errors, and properties of its internal representations.

Citations (459)

View on Semantic Scholar

Summary

The paper shows that image-and-heatmaps models achieve an AUC of 0.895 for malignant classification, outperforming image-only models.
It employs ensemble techniques and a ResNet-based architecture to reduce variance and enhance diagnostic precision.
The study highlights that integrating AI with human radiologist insights improves overall breast cancer screening performance.

Analysis of the Paper: Deep Neural Networks Enhance Radiologists' Efficacy in Breast Cancer Screening

The paper by Wu et al. focuses on the integration of deep neural networks into breast cancer screening, explicitly aiming to enhance radiologists' diagnostic performance. The research demonstrates that utilizing image-and-heatmaps models significantly improves classification accuracy over image-only models, providing insights into the application of deep learning in medical imaging.

Model Comparison and Performance

The paper evaluates multiple model variants, including image-only, heatmaps-only, and combined models. The results indicate that the image-and-heatmaps model performs optimally, achieving an AUC of 0.895 for malignant classification and 0.756 for benign classification on the screening population. This performance surpasses other variants, underscoring the efficacy of combining local and global visual information.

Moreover, the ensemble techniques further boost performance, with the 5x ensemble of the image-and-heatmaps model yielding superior results. This ensembling benefits from reduced variance and provides stronger predictive performance across diverse breast imaging contexts.

Heatmaps and Auxiliary Networks

The paper introduces heatmaps generated through a patch-level auxiliary network. These heatmaps facilitate the model's focus on critical regions, enhancing the classification of malignant findings. The paper highlights that these heatmaps contribute primarily to distinguishing between benign and malignant findings, offering a nuanced understanding of the network's decision-making process.

Model Architecture

The architecture comprises ResNet-based models with shared weights for symmetry in CC and MLO views, a design choice informed by radiological practices. The unique combination of view-specific branches allows the model to refine predictions based on standardized mammography protocols. The model effectively blends convolutional neural networks with radiological expertise, improving its capacity to generalize across varied patient data.

Evaluation Metrics and Transfer Learning

The paper employs AUC as a key metric, reflecting the model's ability to discriminate effectively between classes. Transfer learning from BI-RADS classification tasks is utilized to pretrain the ResNet columns, which significantly enhances model performance, especially in scenarios where annotated biopsy data is scarce. This strategic leveraging of related tasks to inform the primary model presents a robust approach to augmenting deep learning capabilities within medical datasets.

Reader Study and Human-Machine Collaboration

An intriguing component of the research includes a reader paper with 14 radiologists. The results suggest that the hybrid ensemble of radiologists' predictions combined with the model enhances diagnostic accuracy compared to the model or readers alone. This underlines the potential benefit of integrating AI assistance in clinical settings to bolster human decision-making processes.

Implications and Future Directions

The findings suggest substantial practical implications for breast cancer screening practices. The integration of deep learning aids not only in improving diagnostic accuracy but also potentially in reducing workload and fatigue among radiologists. Future research directions may involve refining the interpretability of such models and exploring their application across other radiological imaging tasks.

This paper provides a comprehensive examination of how deep neural networks can bolster the diagnostic performance in breast cancer screening. Through detailed experiments and evaluations, it establishes a foundation for ongoing AI advancements in medical imaging, promising enhanced accuracy and efficiency in clinical diagnostics.

PDF Markdown