Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PubDef: Defending Against Transfer Attacks From Public Models (2310.17645v2)

Published 26 Oct 2023 in cs.LG, cs.AI, cs.CR, and cs.CV

Abstract: Adversarial attacks have been a looming and unaddressed threat in the industry. However, through a decade-long history of the robustness evaluation literature, we have learned that mounting a strong or optimal attack is challenging. It requires both machine learning and domain expertise. In other words, the white-box threat model, religiously assumed by a large majority of the past literature, is unrealistic. In this paper, we propose a new practical threat model where the adversary relies on transfer attacks through publicly available surrogate models. We argue that this setting will become the most prevalent for security-sensitive applications in the future. We evaluate the transfer attacks in this setting and propose a specialized defense method based on a game-theoretic perspective. The defenses are evaluated under 24 public models and 11 attack algorithms across three datasets (CIFAR-10, CIFAR-100, and ImageNet). Under this threat model, our defense, PubDef, outperforms the state-of-the-art white-box adversarial training by a large margin with almost no loss in the normal accuracy. For instance, on ImageNet, our defense achieves 62% accuracy under the strongest transfer attack vs only 36% of the best adversarially trained model. Its accuracy when not under attack is only 2% lower than that of an undefended model (78% vs 80%). We release our code at https://github.com/wagner-group/pubdef.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Efficient and effective augmentation strategy for adversarial training. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022a. URL https://openreview.net/forum?id=ODkBI1d3phW.
  2. Scaling adversarial training to large perturbation bounds. In European Conference on Computer Vision, 2022b. doi: 10.48550/arXiv.2210.09852.
  3. Square attack: A query-efficient black-box adversarial attack via random search. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (eds.), Computer Vision – ECCV 2020, volume 12368, pp.  484–501, Cham, 2020. Springer International Publishing. ISBN 978-3-030-58591-4 978-3-030-58592-1. doi: 10.1007/978-3-030-58592-1_29. URL https://link.springer.com/10.1007/978-3-030-58592-1_29.
  4. “Real attackers don’t compute gradients”: Bridging the gap between adversarial ML research and practice. In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pp.  339–364. IEEE, 2023.
  5. Advocating for multiple defense strategies against adversarial examples. In Irena Koprinska, Michael Kamp, Annalisa Appice, Corrado Loglisci, Luiza Antonie, Albrecht Zimmermann, Riccardo Guidotti, Özlem Özgöbek, Rita P. Ribeiro, Ricard Gavaldà, João Gama, Linara Adilova, Yamuna Krishnamurthy, Pedro M. Ferreira, Donato Malerba, Ibéria Medeiros, Michelangelo Ceci, Giuseppe Manco, Elio Masciari, Zbigniew W. Ras, Peter Christen, Eirini Ntoutsi, Erich Schubert, Arthur Zimek, Anna Monreale, Przemyslaw Biecek, Salvatore Rinzivillo, Benjamin Kille, Andreas Lommatzsch, and Jon Atle Gulla (eds.), ECML PKDD 2020 Workshops, pp.  165–177, Cham, 2020. Springer International Publishing. ISBN 978-3-030-65965-3.
  6. BEiT: BERT pre-training of image transformers. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=p-BhZSz59o4.
  7. Evasion attacks against machine learning at test time. In Hendrik Blockeel, Kristian Kersting, Siegfried Nijssen, and Filip Železný (eds.), Machine Learning and Knowledge Discovery in Databases, pp.  387–402, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. ISBN 978-3-642-40994-3.
  8. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SyZI0GWCZ.
  9. An adaptive model ensemble adversarial attack for boosting adversarial transferability, August 2023a. URL http://arxiv.org/abs/2308.02897.
  10. Rethinking model ensemble in transfer-based adversarial attacks, March 2023b. URL http://arxiv.org/abs/2303.09105.
  11. HopSkipJumpAttack: A query-efficient decision-based attack. arXiv:1904.02144 [cs, math, stat], April 2020a. URL http://arxiv.org/abs/1904.02144.
  12. Stateful detection of black-box adversarial attacks. In Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence, SPAI ’20, pp.  30–39, New York, NY, USA, 2020b. Association for Computing Machinery. ISBN 978-1-4503-7611-2. doi: 10.1145/3385003.3410925. URL https://doi.org/10.1145/3385003.3410925.
  13. Yaofo Chen. Chenyaofo/pytorch-cifar-models, September 2023. URL https://github.com/chenyaofo/pytorch-cifar-models.
  14. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  2206–2216. PMLR, July 2020. URL http://proceedings.mlr.press/v119/croce20b.html.
  15. Adversarial robustness against multiple $l_p$-threat models at the price of one and how to quickly fine-tune robust models to another threat model. arXiv:2105.12508 [cs], May 2021. URL http://arxiv.org/abs/2105.12508.
  16. RobustBench: A standardized adversarial robustness benchmark. Technical report, 2020. URL http://arxiv.org/abs/2010.09670.
  17. Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks. In Proceedings of the 28th USENIX Conference on Security Symposium, SEC’19, pp.  321–338, USA, 2019. USENIX Association. ISBN 978-1-939133-06-9.
  18. Boosting adversarial attacks with momentum. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, March 2018. URL http://arxiv.org/abs/1710.06081.
  19. Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  20. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
  21. Hugging Face. Models - Hugging Face, July 2023. URL https://huggingface.co/models.
  22. Motivating the rules of the game for adversarial example research, 2018. arXiv:1807.06732.
  23. Ian Goodfellow. Defense against the dark arts: An overview of adversarial example security research and future research directions, 2018. arXiv:1806.04169.
  24. Ian Goodfellow. A research agenda: Dynamic models to defend against correlated attacks, 2019. arXiv:1903.06293.
  25. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015. URL http://arxiv.org/abs/1412.6572.
  26. Efficient and transferable adversarial examples from bayesian neural networks. In The 38th Conference on Uncertainty in Artificial Intelligence, 2022. URL https://openreview.net/forum?id=rMf6B8sqg5.
  27. Unsolved problems in ML safety, 2022. arXiv:2109.13916.
  28. Enhancing adversarial example transferability with an intermediate level attack. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  29. Black-box adversarial attacks with limited queries and information. In Jennifer Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp.  2137–2146. PMLR, July 2018. URL https://proceedings.mlr.press/v80/ilyas18a.html.
  30. Testing robustness against unforeseen adversaries, 2023. arXiv:1908.0816.
  31. Blacklight: Defending black-box adversarial attacks on deep neural networks. arXiv:2006.14042 [cs], June 2020. URL http://arxiv.org/abs/2006.14042.
  32. Cracking classifiers for evasion: A case study on the google’s phishing pages filter. In Proceedings of the 25th International Conference on World Wide Web, WWW ’16, pp.  345–356, Republic and Canton of Geneva, CHE, April 2016. International World Wide Web Conferences Steering Committee. ISBN 978-1-4503-4143-1. doi: 10.1145/2872427.2883060. URL https://doi.org/10.1145/2872427.2883060.
  33. Nesterov accelerated gradient and scale invariance for adversarial attacks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SJlHwkBYDH.
  34. Delving into transferable adversarial examples and black-box attacks. In Proceedings of 5th International Conference on Learning Representations, 2017.
  35. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
  36. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb.
  37. How to choose your best allies for a transferable attack?, April 2023. URL http://arxiv.org/abs/2304.02312.
  38. Mixed nash equilibria in the adversarial examples game, February 2021. URL http://arxiv.org/abs/2102.06905.
  39. John Nash. Non-cooperative games. Annals of Mathematics, 54(2):286–295, 1951. ISSN 0003486X. doi: 10.2307/1969529. URL http://www.jstor.org/stable/1969529.
  40. Tencent Keen Security Lab. Experimental security research of Tesla autopilot, 2019.
  41. Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv:1605.07277 [cs], May 2016. URL http://arxiv.org/abs/1605.07277.
  42. Huy Phan. Huyvnphan/PyTorch_CIFAR10, September 2023. URL https://github.com/huyvnphan/PyTorch_CIFAR10.
  43. Random noise defense against query-based black-box attacks. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=ZPSD4xZc6j8.
  44. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  8748–8763. PMLR, July 2021. URL https://proceedings.mlr.press/v139/radford21a.html.
  45. GeoDA: A geometric framework for black-box adversarial attacks. arXiv:2003.06468 [cs], March 2020. URL http://arxiv.org/abs/2003.06468.
  46. Game theoretic mixed experts for combinational adversarial machine learning, November 2022. URL http://arxiv.org/abs/2211.14669.
  47. Tim Roughgarden. CS261: A second course in algorithms lecture #10: The minimax theorem and algorithms for linear programming. Technical report, Stanford University, February 2016. URL https://theory.stanford.edu/~tim/w16/l/l10.pdf.
  48. Do adversarially robust ImageNet models transfer better? In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  3533–3545. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/24357dd085d2c4b1a88a7e0692e60294-Paper.pdf.
  49. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014. URL http://arxiv.org/abs/1312.6199.
  50. The space of transferable adversarial examples. arXiv:1704.03453 [cs, stat], May 2017. URL http://arxiv.org/abs/1704.03453.
  51. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rkZvSe-RZ.
  52. Patches are all you need? Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=rAnB7JSMXL.
  53. Robustness may be at odds with accuracy. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=SyxAb30cY7.
  54. J. v. Neumann. Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100(1):295–320, December 1928. ISSN 1432-1807. doi: 10.1007/BF01448847. URL https://doi.org/10.1007/BF01448847.
  55. Jan van den Brand. A deterministic linear program solver in current matrix multiplication time. In Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’20, pp.  259–278, USA, 2020. Society for Industrial and Applied Mathematics.
  56. Admix: Enhancing the transferability of adversarial attacks. arXiv:2102.00436 [cs], January 2021a. URL http://arxiv.org/abs/2102.00436.
  57. Boosting adversarial transferability through enhanced momentum. arXiv:2103.10609 [cs], March 2021b. URL http://arxiv.org/abs/2103.10609.
  58. Ross Wightman. PyTorch image models. GitHub, 2019. URL https://github.com/rwightman/pytorch-image-models.
  59. Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  60. DVERGE: Diversifying vulnerabilities for enhanced robust generation of ensembles. In Advances in Neural Information Processing Systems, volume 33, pp.  5505–5515. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/3ad7c2ebb96fcba7cda0cf54a2e802f5-Abstract.html.
  61. TRS: Transferability reduced ensemble via promoting gradient diversity and model smoothness. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  17642–17655. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/file/937936029af671cf479fa893db91cbdd-Paper.pdf.
  62. CutMix: Regularization strategy to train strong classifiers with localizable features. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp.  6022–6031, Seoul, Korea (South), October 2019. IEEE. ISBN 978-1-72814-803-8. doi: 10.1109/ICCV.2019.00612. URL https://ieeexplore.ieee.org/document/9008296/.
  63. Improving adversarial transferability via neuron attribution-based attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  14993–15002, June 2022.
  64. Towards good practices in evaluating transfer adversarial attacks, November 2022. URL http://arxiv.org/abs/2211.09565.
Citations (3)

Summary

We haven't generated a summary for this paper yet.