Medical Image Synthesis for Data Augmentation and Anonymization using Generative Adversarial Networks (1807.10225v2)

Published 26 Jul 2018 in cs.CV, cs.LG, and stat.ML

Abstract: Data diversity is critical to success when training deep learning models. Medical imaging data sets are often imbalanced as pathologic findings are generally rare, which introduces significant challenges when training deep learning models. In this work, we propose a method to generate synthetic abnormal MRI images with brain tumors by training a generative adversarial network using two publicly available data sets of brain MRI. We demonstrate two unique benefits that the synthetic images provide. First, we illustrate improved performance on tumor segmentation by leveraging the synthetic images as a form of data augmentation. Second, we demonstrate the value of generative models as an anonymization tool, achieving comparable tumor segmentation results when trained on the synthetic data versus when trained on real subject data. Together, these results offer a potential solution to two of the largest challenges facing machine learning in medical imaging, namely the small incidence of pathological findings, and the restrictions around sharing of patient data.

PDF Abstract

Medical Image Synthesis for Data Augmentation and Anonymization using GANs

The paper "Medical Image Synthesis for Data Augmentation and Anonymization using Generative Adversarial Networks" explores the application of generative adversarial networks (GANs) in the field of medical imaging, specifically focusing on synthesizing brain MRI images for enhanced deep learning model performance. The authors propose using GANs to create synthetic abnormal MRI images containing brain tumors, aiming to address two primary challenges: data imbalance and data privacy.

Methodology and Data

The researchers utilized two publicly available datasets, the Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS), to demonstrate the efficacy of their approach. They leverage an image-to-image translation conditional GAN (pix2pix) to perform two main tasks: MRI-to-label for brain segmentation and label-to-MRI for synthetic image generation.

The input/output includes multi-parametric MRIs, considering T1, T2, contrast-enhanced T1, and FLAIR sequences, processed in three-dimensional (3D) and four-dimensional (4D) forms. This approach better reflects the complexity of medical imaging data as opposed to traditional two-dimensional analyses.

Experimental Procedures

To ensure adequate evaluation, the authors executed a series of experiments focusing on data augmentation and anonymized training. Pre-processing steps included skull-stripping and dimensional adjustments to accommodate computational constraints. The GANs were trained using both real and synthetic datasets, with variable tumor characteristics introduced to assess model effectiveness.

The experiments involved evaluating models under various training conditions: real data alone, real data supplemented with synthetic data, and synthetic data alone with subsequent fine-tuning on a fraction of real data.

Results

The inclusion of synthetic images notably improved segmentation performance. The GAN-based approach demonstrated significant enhancements in tumor segmentation accuracy when additional synthetic images were used alongside traditional augmentation methods. Moreover, satisfactory results were achieved when models were trained exclusively on synthetic images and fine-tuned with a small subset of real data.

Implications and Future Work

The findings suggest that the proposed GAN framework offers an effective solution to the prevalent issue of data scarcity in medical imaging. By generating a diverse set of realistic synthetic images, the framework helps in augmenting datasets, thus improving model training. Additionally, the approach supports the anonymization of medical data, enabling data sharing without compromising patient privacy.

The capacity to generate realistic synthetic data presents a potential pathway for smaller institutions to develop and train robust models with limited real-world data access. Future research could explore enhancing the quality of non-T1-weighted generated images and expanding the application across various imaging modalities and anatomical sites. A focus on further optimizing GAN architectures and incorporating advanced machine learning techniques could lead to improvements in both image realism and computational efficiency.

This research highlights an important step in using artificial intelligence to transcend existing limitations in medical imaging data availability and privacy, paving the way for broader applications and methodologies in the field.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Hoo-Chang Shin (17 papers)
Jameson K Rogers (2 papers)
Christopher G Schwarz (1 paper)
Matthew L Senjem (1 paper)
Jeffrey L Gunter (1 paper)
Katherine Andriole (5 papers)
Mark Michalski (4 papers)
Neil A Tenenholtz (1 paper)

Citations (514)

View on Semantic Scholar