Synthetic Data Augmentation using GAN for Improved Liver Lesion Classification (1801.02385v1)

Published 8 Jan 2018 in cs.CV

Abstract: In this paper, we present a data augmentation method that generates synthetic medical images using Generative Adversarial Networks (GANs). We propose a training scheme that first uses classical data augmentation to enlarge the training set and then further enlarges the data size and its diversity by applying GAN techniques for synthetic data augmentation. Our method is demonstrated on a limited dataset of computed tomography (CT) images of 182 liver lesions (53 cysts, 64 metastases and 65 hemangiomas). The classification performance using only classic data augmentation yielded 78.6% sensitivity and 88.4% specificity. By adding the synthetic data augmentation the results significantly increased to 85.7% sensitivity and 92.4% specificity.

Authors (5)

Maayan Frid-Adar (9 papers)
Eyal Klang (10 papers)
Michal Amitai (3 papers)
Jacob Goldberger (41 papers)
Hayit Greenspan (36 papers)

Citations (653)

View on Semantic Scholar

Summary

Synthetic Data Augmentation Using GAN for Improved Liver Lesion Classification

The discussed paper investigates the utilization of Generative Adversarial Networks (GANs) for producing synthetic medical images to augment small datasets, specifically focusing on liver lesion classification via CT images. This paper addresses the pervasive challenge of limited data availability in medical imaging, where acquiring and annotating data is often costly, time-consuming, and labor-intensive.

Methodology Overview

The authors propose a two-tiered data augmentation strategy involving classical techniques followed by GAN-based synthesis. Initially, data augmentation through simple image transformations, such as rotation, flipping, and scaling, is performed to enhance the dataset size. Subsequently, GANs are applied to generate synthetic images further, introducing greater variability and richness into the dataset.

GAN Framework:

The approach employs a Deep Convolutional GAN (DCGAN) architecture, trained to generate high-quality liver lesion images for distinct lesion types: cysts, metastases, and hemangiomas. The GAN comprises a generator and discriminator, iteratively refined to produce realistic lesion images and accurately distinguish between real and synthesized samples.

Experimental Results

The experiments utilize a dataset of 182 CT liver lesions, composed of cysts, metastases, and hemangiomas. Training begins with classic augmentation alone, showing a sensitivity and specificity of 78.6% and 88.4%, respectively. By incorporating synthetic data via GANs, these metrics improve significantly to a sensitivity of 85.7% and specificity of 92.4%.

Results demonstrate that the optimal augmentation occurs when a combination of classical and synthetic methods is applied, suggesting that the synthetic data contributes crucial variance missing from traditional techniques. The classifier's improved performance highlights the effectiveness of GANs in data-scarce fields like medical imaging.

Implications and Future Directions

This paper's outcomes suggest substantial implications for medical diagnostics, potentially alleviating the need for vast amounts of annotated data while maintaining classifier robustness and accuracy. The use of synthetic data could democratize access to sophisticated diagnostic tools, particularly in under-resourced areas with limited radiological expertise.

Looking forward, the methodology could be broadened beyond liver lesions to include other imaging domains, provided that the relevant data characteristics can be effectively modeled by GANs. Future research might explore integration with more sophisticated GAN architectures or unsupervised models, potentially enhancing synthesis realism and variability even further.

Conclusion

The paper successfully demonstrates the applicability of GAN-generated synthetic images in augmenting medical imaging datasets. The approach achieves quantifiable performance enhancements in liver lesion classification, underscoring the potential of GANs as a tool for resolving data limitations in medical AI applications.