GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification (1803.01229v1)

Published 3 Mar 2018 in cs.CV, cs.LG, and stat.ML

Abstract: Deep learning methods, and in particular convolutional neural networks (CNNs), have led to an enormous breakthrough in a wide range of computer vision tasks, primarily by using large-scale annotated datasets. However, obtaining such datasets in the medical domain remains a challenge. In this paper, we present methods for generating synthetic medical images using recently presented deep learning Generative Adversarial Networks (GANs). Furthermore, we show that generated medical images can be used for synthetic data augmentation, and improve the performance of CNN for medical image classification. Our novel method is demonstrated on a limited dataset of computed tomography (CT) images of 182 liver lesions (53 cysts, 64 metastases and 65 hemangiomas). We first exploit GAN architectures for synthesizing high quality liver lesion ROIs. Then we present a novel scheme for liver lesion classification using CNN. Finally, we train the CNN using classic data augmentation and our synthetic data augmentation and compare performance. In addition, we explore the quality of our synthesized examples using visualization and expert assessment. The classification performance using only classic data augmentation yielded 78.6% sensitivity and 88.4% specificity. By adding the synthetic data augmentation the results increased to 85.7% sensitivity and 92.4% specificity. We believe that this approach to synthetic data augmentation can generalize to other medical classification applications and thus support radiologists' efforts to improve diagnosis.

PDF Abstract

Overview of GAN-based Synthetic Medical Image Augmentation for Increased CNN Performance in Liver Lesion Classification

In "GAN-based Synthetic Medical Image Augmentation for Increased CNN Performance in Liver Lesion Classification," Frid-Adar et al. explore methods to address the bottleneck of limited annotated medical datasets by leveraging generative adversarial networks (GANs) to synthesize high-quality medical images for data augmentation. This approach aims to enhance the performance of convolutional neural networks (CNNs) in the task of classifying liver lesions into distinct categories such as cysts, metastases, and hemangiomas.

Methodology

The paper employs deep learning methodologies and addresses several core challenges:

Data Augmentation Techniques:
- Classic Data Augmentation: Initial experiments utilize classical image manipulation techniques including translations, rotations, flips, and scaling to artificially enlarge the training dataset.
- Synthetic Data Augmentation: The researchers then introduce synthetic data generated using GANs, specifically focusing on two architectures: Deep Convolutional GAN (DCGAN) and Auxiliary Classifier GAN (ACGAN).
Generative Models:
- DCGAN: This model is trained separately for each liver lesion category. The generator network creates realistic lesion images from random noise vectors, aiming to fool a discriminator network tasked with distinguishing real images from generated ones.
- ACGAN: This variant generates labeled lesions in one training process by incorporating class labels into both the generator and the discriminator networks.
Training and Evaluation:
- A three-fold cross-validation approach ensures robustness in evaluating the improvement offered by synthetic data augmentation. The augmented datasets are incrementally expanded and compared against a baseline CNN trained with only classic data augmentation.

Results

The addition of GAN-generated synthetic data notably improves the lesion classification performance of the CNN:

Baseline Performance:
- Classic data augmentation, without any synthesized examples, achieves a total classification accuracy of 78.6%. The sensitivity and specificity for metastases and hemangiomas remain suboptimal due to visual overlap.
Enhanced Performance:
- Integrating GAN-generated synthetic data enhances total accuracy to 85.7%. Significant improvements are observed in both sensitivity and specificity for the more confounding classes of metastases and hemangiomas.
- The DCGAN-based synthetic lesions outperform those generated by ACGANs, suggesting superior sample diversity and representation.

Expert Assessment and t-SNE Visualization

To further validate the synthetic lesions, expert radiologists assess the visual quality and classification difficulty. Radiologists show a consistent performance in classifying real and synthetic lesions, demonstrating the visual indistinguishability between real and generated samples.

Using t-SNE visualization, the paper corroborates that synthesized lesions make the feature space more discriminative, particularly enhancing the separation between categories, especially where classic augmentations fall short.

Implications and Future Work

The implications of this research are two-fold:

Practical Implications:
- The demonstrated improvement in classification performance highlights GAN's potential in medical imaging applications where annotated datasets are scarce.
- Augmentation with synthetic data can effectively support the development of robust automated diagnostic tools, alleviating the demand on radiologist time for extensive manual labeling.
Theoretical Implications:
- The success of DCGAN over ACGAN indicates that models synthesizing each class separately might yield better results than multi-class synthesis architectures in specific applications.
- Future research could optimize generative model architectures or explore semi-supervised approaches to further improve the fidelity and clinical utility of synthetic medical images.

In summary, Frid-Adar et al.'s paper demonstrates that synthetic data augmentation using GANs can substantially benefit CNN performance in the challenging domain of medical image classification. Future developments in this area could generalize these benefits to other medical imaging tasks, fostering advancements in radiological diagnostic systems.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Maayan Frid-Adar (9 papers)
Idit Diamant (9 papers)
Eyal Klang (10 papers)
Michal Amitai (3 papers)
Jacob Goldberger (41 papers)
Hayit Greenspan (36 papers)

Citations (1,460)

View on Semantic Scholar