Understanding Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems (1907.10456v2)

Published 24 Jul 2019 in cs.CV, cs.LG, and eess.IV

Abstract: Deep neural networks (DNNs) have become popular for medical image analysis tasks like cancer diagnosis and lesion detection. However, a recent study demonstrates that medical deep learning systems can be compromised by carefully-engineered adversarial examples/attacks with small imperceptible perturbations. This raises safety concerns about the deployment of these systems in clinical settings. In this paper, we provide a deeper understanding of adversarial examples in the context of medical images. We find that medical DNN models can be more vulnerable to adversarial attacks compared to models for natural images, according to two different viewpoints. Surprisingly, we also find that medical adversarial attacks can be easily detected, i.e., simple detectors can achieve over 98% detection AUC against state-of-the-art attacks, due to fundamental feature differences compared to normal examples. We believe these findings may be a useful basis to approach the design of more explainable and secure medical deep learning systems.

Citations (400)

View on Semantic Scholar

Summary

The paper reveals deep learning models for medical images are highly vulnerable to small adversarial perturbations but these attacks are surprisingly easy to detect.
Attacks on medical imaging require less perturbation magnitude than natural images, while simple detectors can achieve over 98% AUC against them.
The findings highlight the need for tailored defenses and potentially different DNN architectures for reliable AI deployment in clinical settings.

Evaluating Adversarial Vulnerabilities and Detectability in Medical Image DNN Models

This paper presents an extensive examination of adversarial attacks on deep learning models utilized for medical image analysis. The research aims to discern the extent to which deep neural networks (DNNs) used in medical applications are susceptible to adversarial perturbations and the contrasting detectability of these adversarial inputs when compared to natural images. The focus is on three critical domains within medical image processing: diabetic retinopathy classification, thorax disease detection, and melanoma classification, utilizing datasets like Fundoscopy, Chest X-Ray, and Dermoscopy.

Key Findings

Increased Vulnerability: The models trained on medical image datasets are revealed to be significantly more vulnerable to adversarial attacks than those trained on natural images like CIFAR-10 and ImageNet. The findings demonstrate that adversarial attacks on medical images require a much smaller perturbation magnitude (less than $1.0/255$) compared to attacks on natural images.
Detection of Adversarial Examples: Contrary to their vulnerability, adversarial examples in medical imaging are distinctly detectable. It was shown that simple detection methods could achieve over 98% Area Under the Receiver Operating Characteristic (AUC) against adversarial examples, even with straightforward deep feature detectors. This detectability arises primarily due to the broader adversarial perturbations outside lesion regions, making them distinguishable at the feature level.
In-depth Investigation: The paper performs a comprehensive analysis utilizing modern attacks, such as FGSM, BIM, PGD, and CW, and evaluates the twofold perspective of adversarial vulnerability: (a) medical images often possess complex biological textures, which increases the models' sensitivity due to high gradient regions, and (b) the over-parameterization of DNNs, adapted from natural image analysis, induces sharp loss landscapes in medical imaging tasks.
Layer Representation Analysis: Through visualization techniques like t-SNE and Grad-CAM, the research exposes how the adversarial perturbations affect the activation and feature maps within DNNs, revealing that medical adversarial examples exhibit more extensive changes compared to those seen in natural images.

Theoretical and Practical Implications

The implications of this research underscore the complexity and fragility of deploying DNN models in clinical settings, where erroneous predictions due to adversarially perturbed inputs can have severe consequences. These findings stress the need for tailored adversarial defenses specifically designed for medical imaging applications, factoring in the distinct characteristics that separate medical from natural images. Additionally, they prompt a reconsideration of using conventional DNN architectures designed for extensive datasets like ImageNet for relatively simpler tasks in the medical domain.

Future Directions

Given the critical importance and safety concerns in the healthcare domain, there is a pressing need to refine DNN architectures to balance performance and robustness against such attacks. Future research could delve into optimized network designs for medical imaging, potentially integrating domain-specific knowledge to guide network training. Enhancing the explainability of adversarial findings through transparent AI and developing techniques for robust detection and prevention in clinical deployments remain crucial.

Research in the field of adversarial machine learning brings forth critical insight into ensuring the reliability of AI systems in high-stakes environments like healthcare. This paper offers valuable guidance to envision future advances that might offer more resilient and interpretable models, ultimately fostering trust in AI applications pivotal to medical diagnostics.

PDF Markdown