Synthesis of Realistic ECG using Generative Adversarial Networks (1909.09150v1)

Published 19 Sep 2019 in eess.SP, cs.LG, and stat.ML

Abstract: Access to medical data is highly restricted due to its sensitive nature, preventing communities from using this data for research or clinical training. Common methods of de-identification implemented to enable the sharing of data are sometimes inadequate to protect the individuals contained in the data. For our research, we investigate the ability of generative adversarial networks (GANs) to produce realistic medical time series data which can be used without concerns over privacy. The aim is to generate synthetic ECG signals representative of normal ECG waveforms. GANs have been used successfully to generate good quality synthetic time series and have been shown to prevent re-identification of individual records. In this work, a range of GAN architectures are developed to generate synthetic sine waves and synthetic ECG. Two evaluation metrics are then used to quantitatively assess how suitable the synthetic data is for real world applications such as clinical training and data analysis. Finally, we discuss the privacy concerns associated with sharing synthetic data produced by GANs and test their ability to withstand a simple membership inference attack. For the first time we both quantitatively and qualitatively demonstrate that GAN architecture can successfully generate time series signals that are not only structurally similar to the training sets but also diverse in nature across generated samples. We also report on their ability to withstand a simple membership inference attack, protecting the privacy of the training set.

PDF Abstract

Synthesis of Realistic ECG using Generative Adversarial Networks

The paper, "Synthesis of Realistic ECG using Generative Adversarial Networks," investigates the utilization of Generative Adversarial Networks (GANs) for the generation of synthetic medical time series data, specifically electrocardiograms (ECGs). The overarching objective is to produce high-quality synthetic ECG data that mirrors real-world signals while simultaneously addressing privacy concerns associated with the use and sharing of sensitive medical information.

Sensitive medical data, like ECGs, is heavily protected under privacy regulations, making access and utilization for research purposes challenging. Traditional methods of anonymizing such data often fall short due to potential risks of re-identification. Synthetic data generation provides a promising alternative, and GANs have excelled in generating high-quality images. Their use for medical time series, however, is less explored.

The authors focus on synthesizing normal lead II ECG signals from the MIT-BIH Arrhythmia Database. They propose several GAN architectures, combining LSTM and BiLSTM generals with CNN-based discriminators. Additionally, they explore the inclusion of minibatch discrimination to mitigate mode collapse—a common challenge in GAN training.

The paper utilizes two evaluation metrics: Maximum Mean Discrepancy (MMD) and Dynamic Time Warping (DTW). These measure how closely synthetic data resembles real data. While MMD is particularly sensitive to data diversity, DTW is responsive to temporal alignment accuracy, despite computational intensity. Experiments revealed that a 1CNN BiLSTM GAN yielded the most reasonable results for synthetic sine wave generation, suggesting that BiLSTM may better capture non-linear temporal patterns characteristic of ECG signals.

Privacy assessments included a membership inference attack, testing whether synthetic data could disclose membership information of training records. Results showed promising levels of privacy preservation, especially at lower distance thresholds, suggesting GANs might indeed protect underlying sensitive data efficiently.

Despite the success, the research identifies the instability in GAN training as a significant challenge, indicating the necessity for techniques like alternative loss functions or normalization layers to stabilize the training process further.

In conclusion, this paper represents a substantive investigation into GAN's ability to synthesize realistic ECG data, providing crucial insights into both data quality evaluation and privacy preservation. The implications for practical and theoretical developments in AI and data-sharing protocols are significant, suggesting avenues for future research to solidify GAN-based data generation as a reliable tool for clinical training and experimentation, mitigating privacy risks efficiently.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Anne Marie Delaney (1 paper)
Eoin Brophy (11 papers)
Tomas E. Ward (15 papers)

Citations (76)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - Brophy-E/ECG_GAN_MBD: This repository is for the paper "Synthesis of Realistic ECG using Generative Adversarial Networks". (55 stars)

Tweets

https://twitter.com/ArxivDocs/status/1808126646009884718