Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks (2404.03340v1)

Published 4 Apr 2024 in cs.CV, cs.CR, and cs.LG

Abstract: Despite providing high-performance solutions for computer vision tasks, the deep neural network (DNN) model has been proved to be extremely vulnerable to adversarial attacks. Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked. Besides, commonly used adaptive learning and fine-tuning technique is unsuitable for adversarial defense since it is essentially a zero-shot problem when deployed. Thus, to tackle this challenge, we propose an attack-agnostic defense method named Meta Invariance Defense (MID). Specifically, various combinations of adversarial attacks are randomly sampled from a manually constructed Attacker Pool to constitute different defense tasks against unknown attacks, in which a student encoder is supervised by multi-consistency distillation to learn the attack-invariant features via a meta principle. The proposed MID has two merits: 1) Full distillation from pixel-, feature- and prediction-level between benign and adversarial samples facilitates the discovery of attack-invariance. 2) The model simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration. Theoretical and empirical studies on numerous benchmarks such as ImageNet verify the generalizable robustness and superiority of MID under various attacks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. A. Amich and B. Eshete, “Rethinking machine learning robustness via its link with the out-of-distribution problem,” arXiv, 2022.
  2. A. Bagnall, R. Bunescu, and G. Stewart, “Training ensembles to detect adversarial examples,” arXiv, 2017.
  3. M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,” in NIPS, vol. 14, 2001.
  4. W. Brendel, J. Rauber, and M. Bethge, “Decision-based adversarial attacks: Reliable attacks against black-box machine learning models,” arXiv preprint arXiv:1712.04248, 2017.
  5. N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in IEEE SP, 2017, pp. 39–57.
  6. J. Cha, S. Chun, K. Lee, H.-C. Cho, S. Park, Y. Lee, and S. Park, “Swad: Domain generalization by seeking flat minima,” NeurIPS, vol. 34, pp. 22 405–22 418, 2021.
  7. P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh, “Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models,” in ACM workshop on artificial intelligence and security, 2017, pp. 15–26.
  8. F. Croce and M. Hein, “Minimally distorted adversarial examples with a fast adaptive boundary attack,” in International Conference on Machine Learning.   PMLR, 2020, pp. 2196–2205.
  9. ——, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in International conference on machine learning, 2020, pp. 2206–2216.
  10. G. W. Ding, L. Wang, and X. Jin, “Advertorch v0. 1: An adversarial robustness toolbox based on pytorch,” arXiv, 2019.
  11. Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” in CVPR, 2018.
  12. Y. Dong, T. Pang, H. Su, and J. Zhu, “Evading defenses to transferable adversarial examples by translation-invariant attacks,” in CVPR, 2019, pp. 4312–4321.
  13. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  14. R. Duan, Y. Chen, D. Niu, Y. Yang, A. K. Qin, and Y. He, “Advdrop: Adversarial attack to dnns by dropping information,” in ICCV, 2021, pp. 7506–7515.
  15. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in ICML, 2017, pp. 1126–1135.
  16. M. Ghifary, W. B. Kleijn, M. Zhang, and D. Balduzzi, “Domain generalization for object recognition with multi-task autoencoders,” in ICCV, 2015, pp. 2551–2559.
  17. M. Goldblum, L. Fowl, S. Feizi, and T. Goldstein, “Adversarially robust distillation,” in AAAI, vol. 34, no. 04, 2020, pp. 3996–4003.
  18. I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv, 2014.
  19. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  20. G. E. Hinton and S. Roweis, “Stochastic neighbor embedding,” Advances in neural information processing systems, vol. 15, 2002.
  21. K. Hsu, S. Levine, and C. Finn, “Unsupervised learning via meta-learning,” arXiv preprint arXiv:1810.02334, 2018.
  22. B. Huang, C. Tao, R. Lin, and N. Wong, “What do adversarially trained neural networks focus: A fourier domain-based study,” arXiv preprint arXiv:2203.08739, 2022.
  23. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in CVPR, 2017, pp. 4700–4708.
  24. S. Jia, C. Ma, T. Yao, B. Yin, S. Ding, and X. Yang, “Exploring frequency adversarial attacks for face forgery detection,” arXiv preprint arXiv:2203.15674, 2022.
  25. X. Jia, Y. Zhang, B. Wu, K. Ma, J. Wang, and X. Cao, “Las-at: adversarial training with learnable attack strategy,” in CVPR, 2022, pp. 13 398–13 408.
  26. J. Kang, S. Lee, N. Kim, and S. Kwak, “Style neophile: Constantly seeking novel styles for domain generalization,” in CVPR, 2022, pp. 7130–7140.
  27. N. Kanwisher, J. McDermott, and M. M. Chun, “The fusiform face area: a module in human extrastriate cortex specialized for face perception,” Journal of neuroscience, vol. 17, pp. 4302–4311, 1997.
  28. A. Khosla, T. Zhou, T. Malisiewicz, A. A. Efros, and A. Torralba, “Undoing the damage of dataset bias,” in European Conference on Computer Vision.   Springer, 2012, pp. 158–171.
  29. A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” arXiv preprint arXiv:1611.01236, 2016.
  30. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  31. D. Li, Y. Yang, Y.-Z. Song, and T. M. Hospedales, “Learning to generalize: Meta-learning for domain generalization,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  32. F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu, “Defense against adversarial attacks using high-level representation guided denoiser,” in CVPR, 2018, pp. 1778–1787.
  33. J. Lin, C. Song, K. He, L. Wang, and J. E. Hopcroft, “Nesterov accelerated gradient and scale invariance for adversarial attacks,” arXiv preprint arXiv:1908.06281, 2019.
  34. Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in CVPR, 2022, pp. 11 976–11 986.
  35. C. Luo, Q. Lin, W. Xie, B. Wu, J. Xie, and L. Shen, “Frequency-driven imperceptible adversarial attack on semantic similarity,” arXiv preprint arXiv:2203.05151, 2022.
  36. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” arXiv preprint arXiv:1706.06083, 2017.
  37. C. Mao, M. Chiquier, H. Wang, J. Yang, and C. Vondrick, “Adversarial attacks are reversible with natural supervision,” in ICCV, 2021, pp. 661–671.
  38. D. Meng and H. Chen, “Magnet: a two-pronged defense against adversarial examples,” in ACM SIGSAC conference on computer and communications security, 2017, pp. 135–147.
  39. M. Mishkin and L. G. Ungerleider, “Contribution of striate inputs to the visuospatial functions of parieto-preoccipital cortex in monkeys,” Behavioural brain research, vol. 6, no. 1, pp. 57–77, 1982.
  40. S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deep neural networks,” in CVPR, 2016, pp. 2574–2582.
  41. K. Muandet, D. Balduzzi, and B. Schölkopf, “Domain generalization via invariant feature representation,” in International Conference on Machine Learning.   PMLR, 2013, pp. 10–18.
  42. A. Mustafa, S. H. Khan, M. Hayat, R. Goecke, J. Shen, and L. Shao, “Deeply supervised discriminative learning for adversarial defense,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 9, pp. 3154–3166, 2020.
  43. T. Pang, X. Yang, Y. Dong, H. Su, and J. Zhu, “Bag of tricks for adversarial training,” arXiv preprint arXiv:2010.00467, 2020.
  44. N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in IEEE EuroS&P, 2016, pp. 372–387.
  45. N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks,” in IEEE SP, 2016, pp. 582–597.
  46. A. Raghunathan, J. Steinhardt, and P. Liang, “Certified defenses against adversarial examples,” arXiv, 2018.
  47. J. Rony, L. G. Hafemann, L. S. Oliveira, I. B. Ayed, R. Sabourin, and E. Granger, “Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses,” in CVPR, 2019, pp. 4322–4330.
  48. A. Ross and F. Doshi-Velez, “Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients,” in AAAI, vol. 32, no. 1, 2018.
  49. S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” science, vol. 290, no. 5500, pp. 2323–2326, 2000.
  50. B. Schölkopf, A. Smola, and K.-R. Müller, “Nonlinear component analysis as a kernel eigenvalue problem,” Neural computation, vol. 10, no. 5, pp. 1299–1319, 1998.
  51. N. Schweighofer and K. Doya, “Meta-learning in reinforcement learning,” Neural Networks, vol. 16, no. 1, pp. 5–9, 2003.
  52. S. Shen, G. Jin, K. Gao, and Y. Zhang, “Ape-gan: Adversarial perturbation elimination with gan,” arXiv, 2017.
  53. J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling deep neural networks,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 5, pp. 828–841, 2019.
  54. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
  55. J. B. Tenenbaum, V. d. Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” science, vol. 290, no. 5500, pp. 2319–2323, 2000.
  56. V. N. Vapnik, “An overview of statistical learning theory,” IEEE transactions on neural networks, vol. 10, no. 5, pp. 988–999, 1999.
  57. H. Wang, X. Wu, Z. Huang, and E. P. Xing, “High-frequency component helps explain the generalization of convolutional neural networks,” in CVPR, 2020, pp. 8684–8694.
  58. H. Wang, Y. Deng, S. Yoo, H. Ling, and Y. Lin, “Agkd-bml: Defense against adversarial attack by attention guided knowledge distillation and bi-directional metric learning,” in ICCV, 2021, pp. 7658–7667.
  59. Y. Wang, Y. Wang, J. Yang, and Z. Lin, “Demystifying adversarial training via a unified probabilistic framework,” in ICML 2021 Workshop on Adversarial Machine Learning, 2021.
  60. Y. Wang, D. Zou, J. Yi, J. Bailey, X. Ma, and Q. Gu, “Improving adversarial robustness requires revisiting misclassified examples,” in International Conference on Learning Representations, 2020.
  61. Y.-X. Wang, D. Ramanan, and M. Hebert, “Meta-learning to detect rare objects,” in ICCV, 2019, pp. 9925–9934.
  62. K. Q. Weinberger and L. K. Saul, “Unsupervised learning of image manifolds by semidefinite programming,” International journal of computer vision, vol. 70, no. 1, pp. 77–90, 2006.
  63. T. Wu, L. Tong, and Y. Vorobeychik, “Defending against physically realizable attacks on image classification,” arXiv, 2019.
  64. C. Xiao, J.-Y. Zhu, B. Li, W. He, M. Liu, and D. Song, “Spatially transformed adversarial examples,” arXiv, 2018.
  65. C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille, “Mitigating adversarial effects through randomization,” arXiv, 2017.
  66. C. Xie, Z. Zhang, Y. Zhou, S. Bai, J. Wang, Z. Ren, and A. L. Yuille, “Improving transferability of adversarial examples with input diversity,” in CVPR, 2019, pp. 2730–2739.
  67. K. Yang, T. Zhou, Y. Zhang, X. Tian, and D. Tao, “Class-disentanglement and applications in adversarial detection and defense,” in NIPS, vol. 34, 2021, pp. 16 051–16 063.
  68. Z. Yuan, J. Zhang, Y. Jia, C. Tan, T. Xue, and S. Shan, “Meta gradient adversarial attack,” in ICCV, 2021, pp. 7748–7757.
  69. H. Zhang, Y. Yu, J. Jiao, E. Xing, L. El Ghaoui, and M. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in ICML.   PMLR, 2019, pp. 7472–7482.
  70. S. Zheng, Y. Song, T. Leung, and I. Goodfellow, “Improving the robustness of deep neural networks via stability training,” in CVPR, 2016, pp. 4480–4488.
  71. D. Zhou, T. Liu, B. Han, N. Wang, C. Peng, and X. Gao, “Towards defending against adversarial examples via attack-invariant features,” in ICML.   PMLR, 2021, pp. 12 835–12 845.
  72. J. Zhu, J. Yao, B. Han, J. Zhang, T. Liu, G. Niu, J. Zhou, J. Xu, and H. Yang, “Reliable adversarial distillation with unreliable teachers,” arXiv preprint arXiv:2106.04928, 2021.
  73. B. Zi, S. Zhao, X. Ma, and Y.-G. Jiang, “Revisiting adversarial robustness distillation: Robust soft labels make student better,” in ICCV, 2021, pp. 16 443–16 452.
  74. J. Zou, Z. Pan, J. Qiu, X. Liu, T. Rui, and W. Li, “Improving the transferability of adversarial examples with resized-diverse-inputs, diversity-ensemble and region fitting,” in European Conference on Computer Vision.   Springer, 2020, pp. 563–579.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Lei Zhang (1691 papers)
  2. Yuhang Zhou (52 papers)
  3. Yi Yang (856 papers)
  4. Xinbo Gao (194 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets