Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 94 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 28 tok/s
GPT-5 High 30 tok/s Pro
GPT-4o 91 tok/s
GPT OSS 120B 454 tok/s Pro
Kimi K2 212 tok/s Pro
2000 character limit reached

On the Duality Between Sharpness-Aware Minimization and Adversarial Training (2402.15152v2)

Published 23 Feb 2024 in cs.LG, cs.AI, cs.CR, and math.OC

Abstract: Adversarial Training (AT), which adversarially perturb the input samples during training, has been acknowledged as one of the most effective defenses against adversarial attacks, yet suffers from inevitably decreased clean accuracy. Instead of perturbing the samples, Sharpness-Aware Minimization (SAM) perturbs the model weights during training to find a more flat loss landscape and improve generalization. However, as SAM is designed for better clean accuracy, its effectiveness in enhancing adversarial robustness remains unexplored. In this work, considering the duality between SAM and AT, we investigate the adversarial robustness derived from SAM. Intriguingly, we find that using SAM alone can improve adversarial robustness. To understand this unexpected property of SAM, we first provide empirical and theoretical insights into how SAM can implicitly learn more robust features, and conduct comprehensive experiments to show that SAM can improve adversarial robustness notably without sacrificing any clean accuracy, shedding light on the potential of SAM to be a substitute for AT when accuracy comes at a higher priority. Code is available at https://github.com/weizeming/SAM_AT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Towards understanding sharpness-aware minimization. In International Conference on Machine Learning, pages 639–668. PMLR, 2022.
  2. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International conference on machine learning, pages 274–283. PMLR, 2018.
  3. Sharpness-aware minimization improves language model generalization. arXiv preprint arXiv:2110.08529, 2021.
  4. Hilbert-based generative defense for adversarial examples. In ICCV, 2019.
  5. Entropy-sgd: Biasing gradient descent into wide valleys. Journal of Statistical Mechanics: Theory and Experiment, 2019(12):124018, 2019.
  6. Bootstrap generalization ability from loss landscape perspective. In European Conference on Computer Vision, pages 500–517. Springer, 2022.
  7. Robust classification via a single diffusion model. arXiv preprint arXiv:2305.15241, 2023a.
  8. Rethinking model ensemble in transfer-based adversarial attacks. arXiv preprint arXiv:2303.09105, 2023b.
  9. Your diffusion model is secretly a certifiably robust classifier. arXiv preprint arXiv:2402.02316, 2024.
  10. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017.
  11. A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819, 2017.
  12. F. Croce and M. Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack. In ICML, 2020a.
  13. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In ICML, 2020b.
  14. Mind the box: l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-apgd for sparse adversarial attacks on image classifiers. In ICML, 2021.
  15. How robust is google’s bard to adversarial image attacks? In R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models, 2023.
  16. Efficient sharpness-aware minimization for improved training of neural networks. arXiv preprint arXiv:2110.03141, 2021.
  17. Efficient sharpness-aware minimization for improved training of neural networks. arXiv preprint arXiv:2110.03141, 2022.
  18. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008, 2017.
  19. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.
  20. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis., 2015.
  21. Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412, 2020.
  22. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
  23. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2015.
  24. Decomposing a scene into geometric and semantically consistent regions. In ICCV 2009.
  25. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016a.
  26. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer, 2016b.
  27. Simplifying neural nets by discovering flat minima. NeurIPS, 7, 1994.
  28. Flat minima. Neural computation, 9(1):1–42, 1997.
  29. Adversarial examples are not bugs, they are features. In Neural Information Processing Systems, 2019.
  30. Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407, 2018.
  31. Baseline defenses for adversarial attacks against aligned language models, 2023.
  32. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
  33. Hoki Kim. Torchattacks: A pytorch repository for adversarial attacks. arXiv preprint arXiv:2010.01950, 2020.
  34. Fisher sam: Information geometry and sharpness aware minimisation. In International Conference on Machine Learning, pages 11148–11161. PMLR, 2022.
  35. Learning multiple layers of features from tiny images. 2009.
  36. Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In International Conference on Machine Learning, pages 5905–5914. PMLR, 2021.
  37. Towards efficient and scalable sharpness-aware minimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12360–12370, 2022.
  38. Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods. arXiv preprint arXiv:2308.06703, 2023.
  39. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  40. Make sharpness-aware minimization stronger: A sparsified perturbation approach. arXiv preprint arXiv:2210.05177, 2022.
  41. When adversarial training meets vision transformers: Recipes from training to architecture. In NeurIPS, 2022.
  42. Studious bob fight back against jailbreaking via prompt adversarial tuning. arXiv preprint arXiv:2402.06255, 2024.
  43. Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp. In EMNLP, 2020.
  44. Normalization layers are all that sharpness-aware minimization needs. arXiv preprint arXiv:2306.04226, 2023.
  45. Exploring generalization in deep learning. NeurIPS, 30, 2017.
  46. Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL, 2005.
  47. Bag of tricks for adversarial training. arXiv preprint arXiv:2010.00467, 2020.
  48. Distillation as a defense to adversarial perturbations against deep neural networks. In SP, 2016.
  49. Jatmo: Prompt injection defense by task-specific finetuning. arXiv preprint arXiv:2312.17673, 2023.
  50. Pixle: a fast and effective black-box attack based on rearranging pixels. In 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022. doi: 10.1109/ijcnn55064.2022.9892966. URL http://dx.doi.org/10.1109/IJCNN55064.2022.9892966.
  51. Ren and Malik. Learning a classification model for segmentation. In ICCV, 2003.
  52. Overfitting in adversarially robust deep learning, 2020.
  53. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR, 2018.
  54. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv, abs/1910.01108, 2019.
  55. Adversarial training for free! NeurIPS, 32, 2019.
  56. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  57. Training data-efficient image transformers distillation through attention. arXiv preprint arXiv:2012.12877, 2021.
  58. Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152, 2018.
  59. Natural language adversarial defense through synonym encoding. arXiv preprint arXiv:1909.06723, 2021.
  60. Balance, imbalance, and rebalance: Understanding robust overfitting from a minimax game perspective. NeurIPS, 2023.
  61. Characterizing robust overfitting in adversarial training via cross-class features. OpenReview preprint, 2023a.
  62. Cfa: Class-wise calibrated fair adversarial training. In CVPR, 2023b.
  63. Jailbreak and guard aligned language models with only few in-context demonstrations. arXiv preprint arXiv:2310.06387, 2023c.
  64. Weighted automata extraction and explanation of recurrent neural networks for natural language tasks. Journal of Logical and Algebraic Methods in Programming, 136:100907, 2024.
  65. Fast is better than free: Revisiting adversarial training. arXiv preprint arXiv:2001.03994, 2020.
  66. Adversarial weight perturbation helps robust generalization. In NeurIPS, 2020.
  67. Spatially transformed adversarial examples. arXiv preprint arXiv:1801.02612, 2018.
  68. Feature denoising for improving adversarial robustness. In CVPR, 2019.
  69. To be robust or to be fair: Towards fairness in adversarial training. In ICML, 2021.
  70. Robust weight perturbation for adversarial training. arXiv preprint arXiv:2205.14826, 2022a.
  71. Robust weight perturbation for adversarial training. arXiv preprint arXiv:2205.14826, 2022b.
  72. Wide residual networks. CoRR, abs/1605.07146, 2016. URL http://arxiv.org/abs/1605.07146.
  73. Word-level textual adversarial attacking as combinatorial optimization. In ACL, July 2020.
  74. Theoretically principled trade-off between robustness and accuracy. In International conference on machine learning, pages 7472–7482. PMLR, 2019.
  75. Learning statistical texture for semantic segmentation. In CVPR, 2021.
  76. Decentralized sgd and average-direction sam are asymptotically equivalent. arXiv preprint arXiv:2306.02913, 2023.
  77. Universal and transferable adversarial attacks on aligned language models, 2023.
Citations (6)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.