Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks (1612.00542v1)

Published 2 Dec 2016 in cs.CV and cs.LG

Abstract: Mammography is the most widely used method to screen breast cancer. Because of its mostly manual nature, variability in mass appearance, and low signal-to-noise ratio, a significant number of breast masses are missed or misdiagnosed. In this work, we present how Convolutional Neural Networks can be used to directly classify pre-segmented breast masses in mammograms as benign or malignant, using a combination of transfer learning, careful pre-processing and data augmentation to overcome limited training data. We achieve state-of-the-art results on the DDSM dataset, surpassing human performance, and show interpretability of our model.

Citations (200)

View on Semantic Scholar

Summary

The paper demonstrates a CNN-based approach that achieves 92.9% accuracy in classifying breast masses as benign or malignant.
It overcomes limited labeled data by employing transfer learning and data augmentation strategies from the ImageNet and DDSM datasets.
Saliency map visualizations enhance model interpretability by highlighting diagnostic regions that align with expert radiological insights.

Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks

The paper "Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks" by Daniel Lévy and Arzav Jain presents a robust application of deep learning techniques to the classification of breast masses in mammograms, an area of significant clinical importance given the high incidence of breast cancer globally. The variability in mass appearance and low signal-to-noise ratio in mammograms often result in high rates of missed diagnoses, influenced heavily by the radiologist's expertise and workload.

The authors explore the use of Convolutional Neural Networks (CNNs) in delivering an end-to-end solution for the classification of breast masses as benign or malignant. This is a complex task given the scarcity of labeled data in medical imaging, which they effectively mitigate through innovative data augmentation strategies and transfer learning. Their work builds upon previous studies, which often relied on multi-stage approaches combining traditional feature engineering techniques and machine learning models.

Methodological Overview

The paper employs the Digital Database for Screening Mammography (DDSM), a comprehensive resource that provides a large quantity of pre-segmented mammogram images. Within this dataset, 1820 labeled images, derived from 997 patients, are utilized to ensure varied and rigorous train, validation, and test splits. The authors employ three CNN architectures: a baseline shallow network, AlexNet, and GoogLeNet, with careful adjustments to the architectures, particularly in the final fully connected layers tailored to binary classification tasks.

Significant emphasis is placed on transfer learning, wherein pre-trained models on the ImageNet dataset are fine-tuned specifically for the mammography dataset. This approach harnesses the general image representations learned from an extensive corpus of natural images, serving as a sensible initialization strategy that bypasses data scarcity constraints. Additionally, contextual information around the detected mass is considered, with the network inputs designed to include varying padding sizes that capture the relevant diagnostic context. The augmentation strategies effectively mitigate overfitting risks by simulating a multitude of training scenarios through geometric transformations.

Empirical Results

Empirical evaluation reveals that the GoogLeNet architecture exhibits superior performance, achieving a remarkable test accuracy of 92.9%, thus outperforming human expert benchmarks laid out in related clinical studies. The model also attains a recall of 0.934 at 0.924 precision, emphasizing its potential suitability for clinical deployment where false negatives can have particularly grave consequences.

Interpretability and Practical Implications

A noteworthy aspect of the research is the attention given to model interpretability, a critical component for clinical adoption. Utilizing saliency map visualization techniques, the authors provide insights into the decision-making process of the CNN, identifying influential areas of the mammogram, such as mass boundaries, which align with radiological practice. This transparency fosters trust and facilitates integration into diagnostic workflows.

The implications of this research are manifold. From a practical standpoint, the demonstrated ability to surpass human accuracy benchmarks suggests a supportive role for AI in breast cancer screening, potentially reducing workload and inter-operator variability in diagnoses. Theoretically, the findings underscore the importance of transfer learning and context consideration in medical imaging applications.

Future Directions

The paper opens several avenues for further investigation. The exploration of more advanced architectures or ensemble models may further push performance boundaries. Moreover, the integration of attention mechanisms promises enhanced interpretability and localization capabilities, which could revolutionize AI applications in the domain of medical diagnostics.

In conclusion, Lévy and Jain deliver a compelling investigation into the application of deep learning for mammogram classification. Their use of CNNs for direct mass classification, supported by methodological rigor and attention to model applicability in clinical settings, marks a significant contribution to the field.

PDF Markdown