Deep Convolutional Neural Networks for Breast Cancer Histology Image Analysis (1802.00752v2)

Published 2 Feb 2018 in cs.CV

Abstract: Breast cancer is one of the main causes of cancer death worldwide. Early diagnostics significantly increases the chances of correct treatment and survival, but this process is tedious and often leads to a disagreement between pathologists. Computer-aided diagnosis systems showed potential for improving the diagnostic accuracy. In this work, we develop the computational approach based on deep convolution neural networks for breast cancer histology image classification. Hematoxylin and eosin stained breast histology microscopy image dataset is provided as a part of the ICIAR 2018 Grand Challenge on Breast Cancer Histology Images. Our approach utilizes several deep neural network architectures and gradient boosted trees classifier. For 4-class classification task, we report 87.2% accuracy. For 2-class classification task to detect carcinomas we report 93.8% accuracy, AUC 97.3%, and sensitivity/specificity 96.5/88.0% at the high-sensitivity operating point. To our knowledge, this approach outperforms other common methods in automated histopathological image classification. The source code for our approach is made publicly available at https://github.com/alexander-rakhlin/ICIAR2018

Citations (294)

View on Semantic Scholar

Summary

The paper introduces an innovative ensemble method combining pre-trained DCNNs with gradient boosted trees, achieving 93.8% accuracy in a 2-class classification task.
It employs a multi-stage approach with extensive data augmentation to mitigate overfitting on the limited ICIAR 2018 dataset.
The study paves the way for improved digital pathology diagnostics and encourages collaboration through openly shared source code on GitHub.

Deep Convolutional Neural Networks for Breast Cancer Histology Image Analysis

The paper under discussion presents a computational approach for analyzing histology images of breast cancer using Deep Convolutional Neural Networks (DCNNs). The research aims to address the challenges inherent in breast cancer diagnostics, such as subjectivity in visual assessments, by developing an automated system that enhances diagnostic accuracy and inter-observer agreement.

Key Contributions

Dataset and Classifications: The paper utilizes the ICIAR 2018 Grand Challenge dataset comprising 400 hematoxylin and eosin (H&E) stained images. The classification task is divided into two parts: a 4-class task (normal, benign, in situ carcinoma, and invasive carcinoma) and a 2-class task differentiating non-carcinomas from carcinomas.
Methodology: The methodology integrates several pre-trained DCNN architectures for feature extraction—ResNet-50, InceptionV3, and VGG-16—followed by classification using gradient boosted trees implemented via LightGBM. This multi-stage approach is designed to mitigate overfitting, a common challenge when working with small datasets, by employing deep feature extraction and strong data augmentation.
Performance Metrics: The proposed model achieves a 93.8% accuracy for the 2-class classification with an AUC of 97.3% and a sensitivity/specificity of 96.5/88.0% at the high-sensitivity operating point. For the 4-class classification, the model reports an accuracy of 87.2% across 10-fold cross-validation. These results suggest the proposed system's superior performance compared to other methods cited in the literature.
Approach Innovation: Unlike previous methods that opt for training deep networks from scratch on limited data, this paper employs an innovative ensemble of pre-trained networks, highlighting a pragmatic balance between feature extraction capabilities and classification accuracy.

Implications and Future Directions

The implications of this paper are significant for the domain of digital pathology and CAD systems. By demonstrating high accuracy with strong usage of data augmentation and deep feature extraction, the paper paves the way for more reliable automated breast cancer diagnostics. The gradient boosting approach also indicates potential for handling similar diagnostic challenges across various medical imaging contexts.

For future developments, exploring unsupervised or semi-supervised learning methods could further enhance the robustness of CAD systems in scenarios where annotated datasets are limited. Another fruitful direction may involve integrating multimodal data, such as genetic information and imaging data, to provide a holistic view of the patient's health status, potentially refining diagnostic and prognostic capabilities.

The paper’s emphasis on openly sharing the source code (available at the specified GitHub repository) fosters collaboration and further research, allowing the community to build on its findings and enhance the efficacy of AI-driven diagnostic tools in clinical settings.

PDF Markdown

Related Papers

GitHub

GitHub - alexander-rakhlin/ICIAR2018: Our solution for ICIAR 2018 Grand Challenge (192 stars)

Tweets

https://twitter.com/ARakhlin/status/963386701161861120

https://twitter.com/alxndrkalinin/status/1011652773920813056

https://twitter.com/troysk704/status/1011659696749473792