SSL-Auth: An Authentication Framework by Fragile Watermarking for Pre-trained Encoders in Self-supervised Learning (2308.04673v3)
Abstract: Self-supervised learning (SSL), a paradigm harnessing unlabeled datasets to train robust encoders, has recently witnessed substantial success. These encoders serve as pivotal feature extractors for downstream tasks, demanding significant computational resources. Nevertheless, recent studies have shed light on vulnerabilities in pre-trained encoders, including backdoor and adversarial threats. Safeguarding the intellectual property of encoder trainers and ensuring the trustworthiness of deployed encoders pose notable challenges in SSL. To bridge these gaps, we introduce SSL-Auth, the first authentication framework designed explicitly for pre-trained encoders. SSL-Auth leverages selected key samples and employs a well-trained generative network to reconstruct watermark information, thus affirming the integrity of the encoder without compromising its performance. By comparing the reconstruction outcomes of the key samples, we can identify any malicious alterations. Comprehensive evaluations conducted on a range of encoders and diverse downstream tasks demonstrate the effectiveness of our proposed SSL-Auth.
- 2018. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th USENIX Security Symposium (USENIX Security), 1615–1631.
- 2021. Neunac: A novel fragile watermarking algorithm for integrity protection of neural networks. Information Sciences 576:228–241.
- 2021. Poisoning and backdooring contrastive learning. arXiv preprint arXiv:2106.09667.
- 2019. Deepmarks: A secure fingerprinting framework for digital rights management of deep learning models. In Proceedings of the 2019 on International Conference on Multimedia Retrieval, 105–113.
- 2020a. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, 1597–1607. PMLR.
- 2020b. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297.
- 2011. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, 215–223. JMLR Workshop and Conference Proceedings.
- 2022. Sslguard: A watermarking scheme for self-supervised learning pre-trained encoders. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 579–593.
- 2019. Deepsigns: An end-to-end watermarking framework for ownership protection of deep neural networks. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 485–497.
- 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- 2022a. On the difficulty of defending self-supervised learning against model extraction. In International Conference on Machine Learning, 5757–5776. PMLR.
- 2022b. Dataset inference for self-supervised models. Proceedings of Advances in Neural Information Processing Systems 35:12058–12070.
- 2014. Generative adversarial nets. In Proceedings of Advances in Neural Information Processing Systems.
- 2020. Reversible watermarking in deep convolutional neural networks for integrity authentication. In Proceedings of the 28th ACM International Conference on Multimedia, 2273–2280.
- 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.
- 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9729–9738.
- 2018. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670.
- 2022. Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning. In 2022 IEEE Symposium on Security and Privacy (SP), 2043–2059. IEEE.
- 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
- 2009. Learning multiple layers of features from tiny images. Tech Report.
- 2022. Deepauth: A dnn authentication framework by model-unique and fragile signature embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 9595–9603.
- 2022. Demystifying self-supervised trojan attacks. arXiv preprint arXiv:2210.07346.
- 2022. Stolenencoder: stealing pre-trained encoders in self-supervised learning. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2115–2128.
- 2022. Ssl-wm: A black-box watermarking approach for encoders pre-trained by self-supervised learning. arXiv preprint arXiv:2209.03563.
- 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.
- 2022. Fingerprinting deep neural networks globally via universal adversarial perturbations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13430–13439.
- 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115:211–252.
- 2022. Backdoor attacks on self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13337–13346.
- 2023. Can’t steal? cont-steal! contrastive stealing attacks against image encoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16373–16383.
- 2012. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural networks 32:323–332.
- 2017. Embedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, 269–277.
- 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13(4):600–612.
- 2022. Watermarking pre-trained encoders in contrastive learning. In 2022 4th International Conference on Data Intelligence and Security (ICDIS), 228–233. IEEE.
- 2022. Estas: Effective and stable trojan attacks in self-supervised encoders with one target unlabelled sample. arXiv preprint arXiv:2211.10908.
- 2022. Neural network fragile watermarking with no model performance degradation. In 2022 IEEE International Conference on Image Processing (ICIP), 3958–3962. IEEE.
- 2023. Awencoder: Adversarial watermarking pre-trained encoders in contrastive learning. Applied Sciences 13(6):3531.
- 2020. Protecting ip of deep neural networks with watermarking: A new label helps. In Advances in Knowledge Discovery and Data Mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part II 24, 462–474. Springer.
- 2021. Fragile neural network watermarking with trigger image set. In Knowledge Science, Engineering and Management: 14th International Conference, KSEM 2021, Tokyo, Japan, August 14–16, 2021, Proceedings, Part I 14, 280–293. Springer.