AED-PADA:Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation (2404.12635v2)
Abstract: Adversarial example detection, which can be conveniently applied in many scenarios, is important in the area of adversarial defense. Unfortunately, existing detection methods suffer from poor generalization performance, because their training process usually relies on the examples generated from a single known adversarial attack and there exists a large discrepancy between the training and unseen testing adversarial examples. To address this issue, we propose a novel method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA). Specifically, our approach identifies the Principal Adversarial Domains (PADs), i.e., a combination of features of the adversarial examples generated by different attacks, which possesses a large portion of the entire adversarial feature space. Subsequently, we pioneer to exploit Multi-source Unsupervised Domain Adaptation in adversarial example detection, with PADs as the source domains. Experimental results demonstrate the superior generalization ability of our proposed AED-PADA. Note that this superiority is particularly achieved in challenging scenarios characterized by employing the minimal magnitude constraint for the perturbations.
- Detecting adversarial examples and other misclassifications in neural networks by introspection. In ICLR, 2019.
- Targeted attack on deep rl-based autonomous driving with learned visual patterns. In ICRA, pages 10571–10577, 2022.
- Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. In AISec, pages 3–14, 2017.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In ICML, pages 2206–2216, 2020.
- Boosting adversarial attacks with momentum. In CVPR, pages 9185–9193, 2018.
- Evading defenses to transferable adversarial examples by translation-invariant attacks. In CVPR, pages 4312–4321, 2019.
- When does contrastive learning preserve adversarial robustness from pretraining to finetuning? In NeurIPS, pages 21480–21492, 2021.
- Detecting Adversarial Samples from Artifacts. arXiv preprint arXiv:1703.00410, 2017.
- Explaining and harnessing adversarial examples. In ICLR, 2015.
- A kernel two-sample test. JMLR, 13:723–773, 2012.
- On the (Statistical) Detection of Adversarial Examples. arXiv preprint arXiv:1702.06280, 2017.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. In ICLR, 2017.
- Enhancing adversarial example transferability with an intermediate level attack. In ICCV, pages 4732–4741, 2019.
- Learning Multiple Layers of Features from Tiny Images. Technical report, Citeseer, 2009.
- On information and sufficiency. AoMS, 22(1):79–86, 1951.
- Adversarial examples in the physical world. In ICLR. Workshops, 2017.
- A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In NeurIPS, pages 7167–7177, 2018.
- Improving the robustness of deep neural networks via adversarial training with triplet loss. In IJCAI, pages 2909–2915, 2019.
- Yet another intermediate-level attack. In ECCV, pages 241–257, 2020.
- Nesterov accelerated gradient and scale invariance for adversarial attacks. In ICLR, 2020.
- Detection based defense against adversarial examples from the steganalysis point of view. In CVPR, pages 4825–4834, 2019.
- Frequency-driven imperceptible adversarial attack on semantic similarity. In CVPR, pages 15294–15303, 2022.
- Characterizing adversarial subspaces using local intrinsic dimensionality. In ICLR, 2018.
- Understanding adversarial attacks on deep learning based medical image analysis systems. Pattern Recognition, 110:107332, 2021.
- MacQueen and James. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pages 281–297, 1967.
- Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.
- A proposed hybrid clustering algorithm using k-means and birch for cluster based cab recommender system (cbcrs). IJIT, 15(1):219–227, 2023.
- Magnet: A two-pronged defense against adversarial examples. In CCS, pages 135–147, 2017.
- Deepfool: A Simple and Accurate Method to Fool Deep Neural Networks. In CVPR, pages 2574–2582, 2016.
- Reading Digits in Natural Images with Unsupervised Feature Learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
- On spectral clustering: Analysis and an algorithm. In NeurIPS, pages 849–856, 2001.
- Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In CVPR, pages 427–436, 2015.
- Moment matching for multi-source domain adaptation. In ICCV, pages 1406–1415, 2019.
- Exploring misclassifications of robust neural networks to enhance adversarial attacks. Applied Intelligence, pages 1–17, 2023.
- On generating jpeg adversarial images. In ICME, pages 1–6, 2021.
- Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- Pixeldefend: Leveraging Generative Models to Understand and Defend Against Adversarial Examples. In ICLR, 2018.
- DLA: Dense-Layer-Analysis for Adversarial Example Detection. In EuroS&P, pages 198–215, 2019.
- Intriguing properties of neural networks. In ICLR, 2014.
- Detecting adversarial examples from sensitivity inconsistency of spatial-transform domain. In AAAI, pages 9877–9885, 2021.
- Tim van Erven and Peter Harremoës. Rényi divergence and kullback-leibler divergence. IEEE TIT, 60(7):3797–3820, 2014.
- Enhancing the transferability of adversarial attacks through variance tuning. In CVPR, pages 1924–1933, 2021.
- Supervised contrastive learning. In NeurIPS, pages 18661–18673, 2020.
- Learning to combine: Knowledge aggregation for multi-source domain adaptation. In ECCV, pages 727–744, 2020.
- High-frequency component helps explain the generalization of convolutional neural networks. In CVPR, pages 8681–8691, 2020.
- New adversarial image detection based on sentiment analysis. IEEE TNNLS, pages 1–15, 2023.
- Improving transferability of adversarial examples with input diversity. In CVPR, pages 2730–2739, 2019.
- Adversarial examples improve image recognition. In CVPR, pages 816–825, 2020.
- ILA-DA: improving transferability of intermediate level attack with data augmentation. In ICLR, 2023.
- Aligning domain-specific distribution and classifier for cross-domain classification from multiple sources. In AAAI, pages 5989–5996, 2019.
- Exploiting the sensitivity of L2 adversarial examples to erase-and-restore. In ASIA CCS, pages 40–51, 2021.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run paper prompts using GPT-5.