Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality (1801.02613v3)

Published 8 Jan 2018 in cs.LG, cs.CR, and cs.CV

Abstract: Deep Neural Networks (DNNs) have recently been shown to be vulnerable against adversarial examples, which are carefully crafted instances that can mislead DNNs to make errors during prediction. To better understand such attacks, a characterization is needed of the properties of regions (the so-called 'adversarial subspaces') in which adversarial examples lie. We tackle this challenge by characterizing the dimensional properties of adversarial regions, via the use of Local Intrinsic Dimensionality (LID). LID assesses the space-filling capability of the region surrounding a reference example, based on the distance distribution of the example to its neighbors. We first provide explanations about how adversarial perturbation can affect the LID characteristic of adversarial regions, and then show empirically that LID characteristics can facilitate the distinction of adversarial examples generated using state-of-the-art attacks. As a proof-of-concept, we show that a potential application of LID is to distinguish adversarial examples, and the preliminary results show that it can outperform several state-of-the-art detection measures by large margins for five attack strategies considered in this paper across three benchmark datasets. Our analysis of the LID characteristic for adversarial regions not only motivates new directions of effective adversarial defense, but also opens up more challenges for developing new attacks to better understand the vulnerabilities of DNNs.

Authors (9)

Xingjun Ma (114 papers)
Bo Li (1107 papers)
Yisen Wang (120 papers)
Sarah M. Erfani (22 papers)
Sudanthi Wijewickrema (11 papers)
Grant Schoenebeck (51 papers)
Dawn Song (229 papers)
Michael E. Houle (7 papers)
James Bailey (70 papers)

Citations (694)

View on Semantic Scholar

Summary

Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality

Adversarial attacks pose significant challenges to Deep Neural Networks (DNNs), spurring research into the characteristics of adversarial subspaces and methods to defend against these attacks. This paper, authored by Xingjun Ma et al., investigates the use of Local Intrinsic Dimensionality (LID) to describe these adversarial regions and demonstrates the efficacy of LID-based methods for distinguishing adversarial examples created by various state-of-the-art attack methods.

Overview of Contributions

The primary contribution of this paper is the application of LID to the characterization and detection of adversarial examples. LID offers a means to measure the local geometrical complexity around a data point by evaluating the rate of expansion in its surrounding space. This approach is leveraged to highlight the differences between adversarial regions and regions containing normal data. The empirical results presented in the paper show that LID-based methods outperform several established detection measures, such as Kernel Density (KD) estimation and Bayesian Uncertainty (BU).

The core findings and contributions of the paper are:

Proposal of LID for Adversarial Region Characterization: The paper introduces LID as a metric for characterizing the intrinsic dimensionality of adversarial regions in deep networks. The authors explain how adversarial perturbations can significantly increase the LID of an example compared to the LID of normal data.
Empirical Validation of LID Characteristics: Empirical results affirm the hypothesis that adversarial examples exhibit higher LID scores. The experiments demonstrate that these increased LID values can be used to effectively distinguish adversarial examples across different attack methods and datasets.
Comparison with Other Detection Measures: The paper showcases that LID-based detection methods generally outperform KD and BU measures. For instance, in the presence of the Optimization-based (Opt) attack, the LID-based detector achieved AUC scores of 99.24%, 98.94%, and 97.60% on the MNIST, CIFAR-10, and SVHN datasets, respectively, surpassing the combined KD+BU detection scores.
Generalizability Analysis: The paper explores the generalizability of detectors trained on simpler attack strategies like FGM (Fast Gradient Method) to more complex ones like BIM (Basic Iterative Method) and Opt, highlighting LID's robustness and adaptability.
Robustness against Adaptive Attacks: The paper evaluates an adaptive white-box attack where the adversarial example generation incorporates minimizing the LID scores. The findings reveal that LID-based detection remains effective even under such adversarial conditions.

Practical and Theoretical Implications

Adopting LID for characterizing adversarial regions holds significant theoretical and practical implications. From a theoretical standpoint, LID provides a robust measure of local geometrical properties, aligning with theories on high-dimensional data and adversarial examples. Practically, the ability of LID-based detectors to outperform existing measures underscores its potential for enhancing the security of DNN applications against adversarial threats.

Future Directions

The research opens several avenues for further investigation. Future work could delve into refining LID estimation techniques, improving computational efficiency, and exploring the interplay between DNN transformations and LID scores. Additionally, more extensive studies on detection generalizability and robustness against sophisticated adaptive attacks will be critical in developing more resilient adversarial defense mechanisms.

In summary, this paper makes substantial progress in characterizing and detecting adversarial examples using LID, offering a promising direction for future advancements in securing DNNs against adversarial attacks.

PDF Markdown

Related Papers

Find Related Papers