Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Elevating Defenses: Bridging Adversarial Training and Watermarking for Model Resilience (2312.14260v2)

Published 21 Dec 2023 in cs.LG and cs.CR

Abstract: Machine learning models are being used in an increasing number of critical applications; thus, securing their integrity and ownership is critical. Recent studies observed that adversarial training and watermarking have a conflicting interaction. This work introduces a novel framework to integrate adversarial training with watermarking techniques to fortify against evasion attacks and provide confident model verification in case of intellectual property theft. We use adversarial training together with adversarial watermarks to train a robust watermarked model. The key intuition is to use a higher perturbation budget to generate adversarial watermarks compared to the budget used for adversarial training, thus avoiding conflict. We use the MNIST and Fashion-MNIST datasets to evaluate our proposed technique on various model stealing attacks. The results obtained consistently outperform the existing baseline in terms of robustness performance and further prove the resilience of this defense against pruning and fine-tuning removal attacks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. Deng, L. 2012. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine, 29(6): 141–142.
  2. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
  3. Adversarial examples in the physical world. In Artificial intelligence safety and security, 99–112. Chapman and Hall/CRC.
  4. Adversarial frontier stitching for remote neural network watermarking. Neural Computing and Applications, 32: 9233–9244.
  5. FedIPR: Ownership verification for federated deep neural network models. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  6. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
  7. Robust watermarking of neural network with exponential weighting. In Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security, 228–240.
  8. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, 506–519.
  9. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
  10. Conflicting Interactions Among Protections Mechanisms for Machine Learning Models. arXiv preprint arXiv:2207.01991.
  11. Dawn: Dynamic adversarial watermarking of neural networks. In Proceedings of the 29th ACM International Conference on Multimedia, 4417–4425.
  12. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204.
  13. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on international conference on multimedia retrieval, 269–277.
  14. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.
  15. To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Janvi Thakkar (6 papers)
  2. Giulio Zizzo (25 papers)
  3. Sergio Maffeis (14 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.