SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models (2404.12699v1)
Abstract: Instead of building deep learning models from scratch, developers are more and more relying on adapting pre-trained models to their customized tasks. However, powerful pre-trained models may be misused for unethical or illegal tasks, e.g., privacy inference and unsafe content generation. In this paper, we introduce a pioneering learning paradigm, non-fine-tunable learning, which prevents the pre-trained model from being fine-tuned to indecent tasks while preserving its performance on the original task. To fulfill this goal, we propose SOPHON, a protection framework that reinforces a given pre-trained model to be resistant to being fine-tuned in pre-defined restricted domains. Nonetheless, this is challenging due to a diversity of complicated fine-tuning strategies that may be adopted by adversaries. Inspired by model-agnostic meta-learning, we overcome this difficulty by designing sophisticated fine-tuning simulation and fine-tuning evaluation algorithms. In addition, we carefully design the optimization process to entrap the pre-trained model within a hard-to-escape local optimum regarding restricted domains. We have conducted extensive experiments on two deep learning modes (classification and generation), seven restricted domains, and six model architectures to verify the effectiveness of SOPHON. Experiment results verify that fine-tuning SOPHON-protected models incurs an overhead comparable to or even greater than training from scratch. Furthermore, we confirm the robustness of SOPHON to three fine-tuning methods, five optimizers, various learning rates and batch sizes. SOPHON may help boost further investigations into safe and responsible AI.
- Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In Unsupervised and Transfer Learning - Workshop. JMLR.org, 2012.
- Managing AI risks in an era of rapid progress. arXiv preprint arXiv:2310.17688, 2023.
- An analysis of single-layer networks in unsupervised feature learning. In International Conference on Artificial Intelligence and Statistics. JMLR.org, 2011.
- CINIC-10 is not imagenet or CIFAR-10. arXiv preprint arXiv:1810.03505, 2018.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.
- Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12(61):2121–2159, 2011.
- fastai. Github repo: Imagenette. https://github.com/fastai/imagenette, 2022.
- Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning. PMLR, 2017.
- Online meta-learning. In International Conference on Machine Learning. PMLR, 2019.
- Probabilistic model-agnostic meta-learning. In Conference on Neural Information Processing Systems. PMLR, 2018.
- Generative adversarial networks. arXiv preprint arXiv:1406.2661, 2014.
- Deep residual learning for image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016.
- Kashmir Hill. How target figured out a teen girl was pregnant before her father did. https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did, 2012.
- Denoising diffusion probabilistic models. In Conference on Neural Information Processing Systems. PMLR, 2020.
- The White House. Fact sheet: President Biden issues executive order on safe, secure, and trustworthy artificial intelligence. https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence, 2021.
- Tatum Hunter. AI porn is easy to make now. for women, that’s a nightmare. https://www.washingtonpost.com/technology/2023/02/13/ai-porn-deepfakes-women-consent, 2023.
- Wiliam Hunter. Paedophiles are using AI to create sexual images of celebrities as children, report finds. https://www.dailymail.co.uk/sciencetech/article-12669791/Paedophiles-using-AI-create-sexual-images-celebrities-CHILDREN-report-finds.html, 2023.
- Matthew Hutson. Who should stop unethical A.I.? https://www.newyorker.com/tech/annals-of-technology/who-should-stop-unethical-ai, 2021.
- A style-based generator architecture for generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations. OpenReview.net, 2015.
- Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
- Learning multiple layers of features from tiny images. 2009.
- S. Kullback and R. A. Leibler. On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79–86, 1951.
- Yann LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist, 1998.
- Gradient-based meta-learning with learned layerwise metric and subspace. In International Conference on Machine Learning. PMLR, 2018.
- Cixin Liu. The three-body problem, volume 1. Macmillan, 2014.
- Deep learning face attributes in the wild. In IEEE International Conference on Computer Vision, 2015.
- Fully convolutional networks for semantic segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015.
- Madison McQueen. AI porn is here and it’s dangerous. https://exoduscry.com/articles/ai-porn, 2023.
- Dan Milmo. AI-created child sexual abuse images ‘threaten to overwhelm internet’. https://www.theguardian.com/technology/2023/oct/25/ai-created-child-sexual-abuse-images-threaten-overwhelm-internet, 2023.
- Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning. PMLR, 2011.
- PyTorch: An imperative style, high-performance deep learning library. In Conference on Neural Information Processing Systems. PMLR, 2019.
- Boris T Polyak. Some methods of speeding up the convergence of iteration methods. USSR computational mathematics and mathematical physics, 4(5):1–17, 1964.
- Few-shot image recognition by predicting parameters from activations. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
- Faster R-CNN: towards real-time object detection with region proposal networks. In Conference on Neural Information Processing Systems. PMLR, 2015.
- A stochastic approximation method. The annals of mathematical statistics, 22(3):400–407, 1951.
- High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Rebecca J. Rosen. Armed with facebook ’likes’ alone, researchers can tell your race, gender, and sexual orientation. https://www.theatlantic.com/technology/archive/2013/03/armed-with-facebook-likes-alone-researchers-can-tell-your-race-gender-and-sexual-orientation, 2013.
- Learning representations by back-propagating errors. Nature, 323(6088):533–536, 1986.
- Meta-learning with latent embedding optimization. In International Conference on Learning Representations. OpenReview.net, 2019.
- Overfeat: Integrated recognition, localization and detection using convolutional networks. In International Conference on Learning Representations, 2014.
- Eric Siegel. The privacy pickle: Hewlett-packard’s prediction of employee behavior. https://www.predictiveanalyticsworld.com/machinelearningtimes/the-privacy-pickle-hewlett-packards-prediction-of-employee-behavior, 2013.
- Eric Siegel. When does predictive technology become unethical? https://hbr.org/2020/10/when-does-predictive-technology-become-unethical, 2020.
- Meta-transfer learning for few-shot learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
- On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning. JMLR.org, 2013.
- Show and tell: A neural image caption generator. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015.
- Domain specified optimization for deployment authorization. In IEEE/CVF International Conference on Computer Vision, 2023.
- Model barrier: A compact un-transferable isolation domain for model intellectual property protection. In IEEE/CVF Conference on Computer Vision and Pattern, 2023.
- Non-transferable learning: A new approach for model ownership verification and applicability authorization. In International Conference on Learning Representations. OpenReview.net, 2022.
- Rhiannon Williams. Text-to-image AI models can be tricked into generating disturbing images. https://www.technologyreview.com/2023/11/17/1083593/text-to-image-ai-models-can-be-tricked-into-generating-disturbing-images, 2023.
- How transferable are features in deep neural networks? In Conference on Neural Information Processing Systems. PMLR, 2014.
- Metaformer baselines for vision. arXiv preprint arXiv:2210.13452, 2022.
- Matthew D. Zeiler. Adadelta: An adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.
- Unsupervised non-transferable text classification. In Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2022.
- Jiangyi Deng (7 papers)
- Shengyuan Pang (4 papers)
- Yanjiao Chen (16 papers)
- Liangming Xia (1 paper)
- Yijie Bai (3 papers)
- Haiqin Weng (10 papers)
- Wenyuan Xu (35 papers)