Safe and Robust Watermark Injection with a Single OoD Image (2309.01786v2)
Abstract: Training a high-performance deep neural network requires large amounts of data and computational resources. Protecting the intellectual property (IP) and commercial ownership of a deep model is challenging yet increasingly crucial. A major stream of watermarking strategies implants verifiable backdoor triggers by poisoning training samples, but these are often unrealistic due to data privacy and safety concerns and are vulnerable to minor model changes such as fine-tuning. To overcome these challenges, we propose a safe and robust backdoor-based watermark injection technique that leverages the diverse knowledge from a single out-of-distribution (OoD) image, which serves as a secret key for IP verification. The independence of training data makes it agnostic to third-party promises of IP security. We induce robustness via random perturbation of model parameters during watermark injection to defend against common watermark removal attacks, including fine-tuning, pruning, and model extraction. Our experimental results demonstrate that the proposed watermarking approach is not only time- and sample-efficient without training data, but also robust against the watermark removal attacks above.
- Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th {normal-{\{{USENIX}normal-}\}} Security Symposium ({normal-{\{{USENIX}normal-}\}} Security 18), pp. 1615–1631, 2018.
- Extrapolating from a single image to a thousand classes using distillation. In ICLR, 2023.
- A critical analysis of self-supervision, or what we can learn from a single image. arXiv preprint arXiv:1904.13132, 2019.
- Copy, right? a testing framework for copyright protection of deep learning models. In 2022 IEEE Symposium on Security and Privacy (SP), pp. 824–841. IEEE, 2022.
- You are caught stealing my winning lottery ticket! making a lottery ticket claim its ownership. Advances in Neural Information Processing Systems, 34:1780–1791, 2021.
- A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819, 2017.
- Deepsigns: An end-to-end watermarking framework for ownership protection of deep neural networks. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 485–497, 2019.
- Data-free adversarial distillation. arXiv preprint arXiv:1912.11006, 2019.
- Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020.
- Can adversarial weight perturbations inject neural backdoors. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 2029–2032, 2020.
- Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):1563–1580, 2022.
- Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7:47230–47244, 2019.
- Deep residual learningfor image recognition. CoRR, abs/1512, 3385:2, 2015.
- Sharpness-aware data poisoning attack. arXiv preprint arXiv:2305.14851, 2023.
- Alex Hern. Techscape: Will meta’s massive leak democratise ai – and at what cost? The Guardian, 2023. URL https://www.theguardian.com/technology/2023/mar/07/techscape-meta-leak-llama-chatgpt-ai-crossroads.
- Revisiting data-free knowledge distillation with poisoned teachers. The Fortieth International Conference on Machine Learning, 2023.
- Entangled watermarks as a defense against model extraction. In Proceedings of the 30th USENIX Security Symposium, 2021.
- Prada: protecting against dnn model stealing attacks. In 2019 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 512–527. IEEE, 2019.
- Margin-based neural network watermarking. 2023.
- Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016.
- Learning multiple layers of features from tiny images. 2009.
- White-box watermarking scheme for fully-connected layers in fine-tuning model. In Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, pp. 165–170, 2021.
- Adversarial frontier stitching for remote neural network watermarking. Neural Computing and Applications, 32:9233–9244, 2020.
- Knowledge-free black-box watermark and ownership proof for image classification neural networks. arXiv preprint arXiv:2204.04522, 2022.
- Leveraging multi-task learning for umambiguous and flexible deep neural network watermarking. In SafeAI@ AAAI, 2022.
- Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE TDSC, 2020.
- A survey of deep neural network watermarking techniques. Neurocomputing, 461:171–193, 2021.
- Energy-based out-of-distribution detection. Advances in neural information processing systems, 33:21464–21475, 2020.
- Trojaning attack on neural networks. In NDSS, 2018a.
- Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270, 2018b.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.
- Aime: watermarking ai models by leveraging errors. In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 304–309. IEEE, 2022.
- Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4954–4963, 2019.
- Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp. 506–519, 2017.
- Tbt: Targeted neural network attack with bit trojan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13198–13207, 2020.
- Comparing rewinding and fine-tuning in neural network pruning. arXiv preprint arXiv:2003.02389, 2020.
- Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks, 32:323–332, 2012.
- Stealing machine learning models via prediction apis. In USENIX security symposium, volume 16, pp. 601–618, 2016.
- Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on international conference on multimedia retrieval, pp. 269–277, 2017.
- Trap and replace: Defending backdoor attacks by trapping them into an easy-to-replace subnetwork. Advances in neural information processing systems, 35:36026–36039, 2022a.
- Free fine-tuning: A plug-and-play watermarking scheme for deep neural networks. arXiv preprint arXiv:2210.07809, 2022b.
- Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, 33:2958–2969, 2020.
- Es attack: Model stealing against deep neural networks without data hurdles. IEEE Transactions on Emerging Topics in Computational Intelligence, 6(5):1258–1270, 2022.
- Wide residual networks. In Edwin R. Hancock Richard C. Wilson and William A. P. Smith (eds.), Proceedings of the British Machine Vision Conference (BMVC), pp. 87.1–87.12. BMVA Press, September 2016. ISBN 1-901725-59-6.
- Rethinking the backdoor attacks’ triggers: A frequency perspective. In ICCV, 2021.
- Protecting intellectual property of deep neural networks with watermarking. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pp. 159–172, 2018.