On the Robustness of Dataset Inference (2210.13631v3)
Abstract: Machine learning (ML) models are costly to train as they can require a significant amount of data, computational resources and technical expertise. Thus, they constitute valuable intellectual property that needs protection from adversaries wanting to steal them. Ownership verification techniques allow the victims of model stealing attacks to demonstrate that a suspect model was in fact stolen from theirs. Although a number of ownership verification techniques based on watermarking or fingerprinting have been proposed, most of them fall short either in terms of security guarantees (well-equipped adversaries can evade verification) or computational cost. A fingerprinting technique, Dataset Inference (DI), has been shown to offer better robustness and efficiency than prior methods. The authors of DI provided a correctness proof for linear (suspect) models. However, in a subspace of the same setting, we prove that DI suffers from high false positives (FPs) -- it can incorrectly identify an independent model trained with non-overlapping data from the same distribution as stolen. We further prove that DI also triggers FPs in realistic, non-linear suspect models. We then confirm empirically that DI in the black-box setting leads to FPs, with high confidence. Second, we show that DI also suffers from false negatives (FNs) -- an adversary can fool DI (at the cost of incurring some accuracy loss) by regularising a stolen model's decision boundaries using adversarial training, thereby leading to an FN. To this end, we demonstrate that black-box DI fails to identify a model adversarially trained from a stolen dataset -- the setting where DI is the hardest to evade. Finally, we discuss the implications of our findings, the viability of fingerprinting-based ownership verification in general, and suggest directions for future work.
- Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th USENIX Security Symposium, pp. 1615–1631, 2018.
- Extraction of complex DNN models: Real threat or boogeyman?". In "Engineering Dependable and Secure Machine Learning Systems", pp. 42–57, 2020.
- IPGuard: Protecting intellectual property of deep neural networks via fingerprinting the classification boundary. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, ASIA CCS ’21, pp. 14–25, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450382878. doi: 10.1145/3433210.3437526. URL https://doi.org/10.1145/3433210.3437526.
- Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pp. 1897–1914, 2022. doi: 10.1109/SP46214.2022.9833649.
- Learnable boundary guided adversarial training. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 15721–15730, 2021.
- A light recipe to train robust vision transformers. arXiv preprint arXiv:2209.07399, 2022.
- Understanding real-world threats to deep learning models in android apps. 2022. doi: 10.48550/arXiv.2209.09577. URL https://arxiv.org/abs/2209.09577v1.
- SHAPr: An efficient and versatile membership privacy risk metric for machine learning. CoRR, abs/2112.02230, 2021. URL https://arxiv.org/abs/2112.02230.
- Increasing the cost of model extraction with calibrated proof of work. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=EAy7C1cgE1L.
- Jamie Hayes. Provable trade-offs between private & robust machine learning. arXiv preprint arXiv:2006.04622, 2020.
- Entangled watermarks as a defense against model extraction. 2021a.
- Proof-of-learning: Definitions and practice. In 2021 IEEE Symposium on Security and Privacy (SP), pp. 1039–1056. IEEE, 2021b.
- PRADA: protecting against DNN model stealing attacks. In IEEE European Symposium on Security & Privacy, pp. 1–16. IEEE, 2019.
- GAZELLE: A low latency framework for secure neural network inference. In 27th USENIX Security Symposium (USENIX Security 18), pp. 1651–1669, Baltimore, MD, August 2018. USENIX Association. ISBN 978-1-939133-04-5. URL https://www.usenix.org/conference/usenixsecurity18/presentation/juvekar.
- Failure modes in machine learning. https://learn.microsoft.com/en-us/security/engineering/failure-modes-in-machine-learning, 2019. Online; accessed 20 September 2022.
- Defending against neural network model stealing attacks using deceptive perturbations. In 2019 IEEE Security and Privacy Workshops (SPW), pp. 43–49, 2019. doi: 10.1109/SPW.2019.00020.
- Oblivious neural network predictions via minionn transformations. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, pp. 619–631, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450349468. doi: 10.1145/3133956.3134056. URL https://doi.org/10.1145/3133956.3134056.
- Deep neural network fingerprinting by conferrable adversarial examples. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=VqzVhqxkjH1.
- Sok: How robust is image classification deep neural network watermarking? In 2022 IEEE Symposium on Security and Privacy (SP), pp. 787–804, 2022. doi: 10.1109/SP46214.2022.9833693.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb.
- Dataset inference: Ownership resolution in machine learning. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=hvdKKV2yt7T.
- How to steer your adversary: Targeted and efficient model stealing defenses with gradient redirection. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 15241–15254. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/mazeika22a.html.
- A PAC-bayesian approach to spectrally-normalized margin bounds for neural networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Skz_WfbCZ.
- Diffusion models for adversarial purification. In International Conference on Machine Learning (ICML), 2022.
- Knockoff nets: Stealing functionality of black-box models. In CVPR, pp. 4954–4963, 2019.
- Prediction poisoning: Towards defenses against dnn model stealing attacks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SyevYxHtDB.
- Metav: A meta-verifier approach to task-agnostic model fingerprinting. 2022. doi: 10.48550/ARXIV.2201.07391. URL https://arxiv.org/abs/2201.07391.
- Practical black-box attacks against machine learning. In ACM Symposium on Information, Computer and Communications Security, pp. 506–519. ACM, 2017.
- Forgotten siblings: Unifying attacks on machine learning and digital watermarking. In IEEE European Symposium on Security & Privacy, pp. 488–502, 2018.
- Radioactive data: tracing through training. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 8326–8335. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/sablayrolles20a.html.
- F1: A fast and programmable accelerator for fully homomorphic encryption. In MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO ’21, pp. 238–252, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450385572. doi: 10.1145/3466752.3480070. URL https://doi.org/10.1145/3466752.3480070.
- Craterlake: A hardware accelerator for efficient unbounded computation on encrypted data. In Proceedings of the 49th Annual International Symposium on Computer Architecture, ISCA ’22, pp. 173–187, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450386104. doi: 10.1145/3470496.3527393. URL https://doi.org/10.1145/3470496.3527393.
- On the application of binary neural networks in oblivious inference. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4625–4634, 2021. doi: 10.1109/CVPRW53098.2021.00521.
- Membership inference attacks against adversarially robust deep learning models. In 2019 IEEE Security and Privacy Workshops (SPW), pp. 50–56, 2019a. doi: 10.1109/SPW.2019.00021.
- Privacy risks of securing machine learning models against adversarial examples. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 241–257, 2019b.
- Sebastian Szyller and N. Asokan. Conflicting interactions among protection mechanisms for machine learning models. 2022. doi: 10.48550/ARXIV.2207.01991. URL https://arxiv.org/abs/2207.01991.
- DAWN: Dynamic Adversarial Watermarking of Neural Networks, pp. 4417–4425. Association for Computing Machinery, New York, NY, USA, 2021. ISBN 9781450386517. URL https://doi.org/10.1145/3474085.3475591.
- Stealing machine learning models via prediction apis. In 25th USENIX Security Symposium, pp. 601–618, 2016.
- Joel A Tropp. User-friendly tail bounds for sums of random matrices. Foundations of computational mathematics, 12(4):389–434, 2012.
- Embedding watermarks into deep neural networks. In ACM International Conference on Multimedia Retrieval, pp. 269–277. ACM, 2017.
- Better diffusion models further improve adversarial training. arXiv preprint arXiv:2302.04638, 2023.
- Piranha: A GPU platform for secure computation. In 31st USENIX Security Symposium (USENIX Security 22), pp. 827–844, Boston, MA, August 2022. USENIX Association. ISBN 978-1-939133-31-1. URL https://www.usenix.org/conference/usenixsecurity22/presentation/watson.
- Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pp. 268–282, 2018. doi: 10.1109/CSF.2018.00027.
- Protecting intellectual property of deep neural networks with watermarking. In ACM Symposium on Information, Computer and Communications Security, pp. 159–172, 2018.
- “adversarial examples” for proof-of-learning. In 2022 IEEE Symposium on Security and Privacy (SP), pp. 1408–1422. IEEE, 2022.
- Protecting decision boundary of machine learning model with differentially private perturbation. IEEE Transactions on Dependable and Secure Computing, 19(3):2007–2022, 2022. doi: 10.1109/TDSC.2020.3043382.
- Sebastian Szyller (14 papers)
- Rui Zhang (1138 papers)
- Jian Liu (404 papers)
- N. Asokan (78 papers)