Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models (2404.12699v1)

Published 19 Apr 2024 in cs.LG

Abstract: Instead of building deep learning models from scratch, developers are more and more relying on adapting pre-trained models to their customized tasks. However, powerful pre-trained models may be misused for unethical or illegal tasks, e.g., privacy inference and unsafe content generation. In this paper, we introduce a pioneering learning paradigm, non-fine-tunable learning, which prevents the pre-trained model from being fine-tuned to indecent tasks while preserving its performance on the original task. To fulfill this goal, we propose SOPHON, a protection framework that reinforces a given pre-trained model to be resistant to being fine-tuned in pre-defined restricted domains. Nonetheless, this is challenging due to a diversity of complicated fine-tuning strategies that may be adopted by adversaries. Inspired by model-agnostic meta-learning, we overcome this difficulty by designing sophisticated fine-tuning simulation and fine-tuning evaluation algorithms. In addition, we carefully design the optimization process to entrap the pre-trained model within a hard-to-escape local optimum regarding restricted domains. We have conducted extensive experiments on two deep learning modes (classification and generation), seven restricted domains, and six model architectures to verify the effectiveness of SOPHON. Experiment results verify that fine-tuning SOPHON-protected models incurs an overhead comparable to or even greater than training from scratch. Furthermore, we confirm the robustness of SOPHON to three fine-tuning methods, five optimizers, various learning rates and batch sizes. SOPHON may help boost further investigations into safe and responsible AI.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In Unsupervised and Transfer Learning - Workshop. JMLR.org, 2012.
  2. Managing AI risks in an era of rapid progress. arXiv preprint arXiv:2310.17688, 2023.
  3. An analysis of single-layer networks in unsupervised feature learning. In International Conference on Artificial Intelligence and Statistics. JMLR.org, 2011.
  4. CINIC-10 is not imagenet or CIFAR-10. arXiv preprint arXiv:1810.03505, 2018.
  5. BERT: pre-training of deep bidirectional transformers for language understanding. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.
  6. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12(61):2121–2159, 2011.
  7. fastai. Github repo: Imagenette. https://github.com/fastai/imagenette, 2022.
  8. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning. PMLR, 2017.
  9. Online meta-learning. In International Conference on Machine Learning. PMLR, 2019.
  10. Probabilistic model-agnostic meta-learning. In Conference on Neural Information Processing Systems. PMLR, 2018.
  11. Generative adversarial networks. arXiv preprint arXiv:1406.2661, 2014.
  12. Deep residual learning for image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016.
  13. Kashmir Hill. How target figured out a teen girl was pregnant before her father did. https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did, 2012.
  14. Denoising diffusion probabilistic models. In Conference on Neural Information Processing Systems. PMLR, 2020.
  15. The White House. Fact sheet: President Biden issues executive order on safe, secure, and trustworthy artificial intelligence. https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence, 2021.
  16. Tatum Hunter. AI porn is easy to make now. for women, that’s a nightmare. https://www.washingtonpost.com/technology/2023/02/13/ai-porn-deepfakes-women-consent, 2023.
  17. Wiliam Hunter. Paedophiles are using AI to create sexual images of celebrities as children, report finds. https://www.dailymail.co.uk/sciencetech/article-12669791/Paedophiles-using-AI-create-sexual-images-celebrities-CHILDREN-report-finds.html, 2023.
  18. Matthew Hutson. Who should stop unethical A.I.? https://www.newyorker.com/tech/annals-of-technology/who-should-stop-unethical-ai, 2021.
  19. A style-based generator architecture for generative adversarial networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
  20. Adam: A method for stochastic optimization. In International Conference on Learning Representations. OpenReview.net, 2015.
  21. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
  22. Learning multiple layers of features from tiny images. 2009.
  23. S. Kullback and R. A. Leibler. On Information and Sufficiency. The Annals of Mathematical Statistics, 22(1):79–86, 1951.
  24. Yann LeCun. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist, 1998.
  25. Gradient-based meta-learning with learned layerwise metric and subspace. In International Conference on Machine Learning. PMLR, 2018.
  26. Cixin Liu. The three-body problem, volume 1. Macmillan, 2014.
  27. Deep learning face attributes in the wild. In IEEE International Conference on Computer Vision, 2015.
  28. Fully convolutional networks for semantic segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015.
  29. Madison McQueen. AI porn is here and it’s dangerous. https://exoduscry.com/articles/ai-porn, 2023.
  30. Dan Milmo. AI-created child sexual abuse images ‘threaten to overwhelm internet’. https://www.theguardian.com/technology/2023/oct/25/ai-created-child-sexual-abuse-images-threaten-overwhelm-internet, 2023.
  31. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning. PMLR, 2011.
  32. PyTorch: An imperative style, high-performance deep learning library. In Conference on Neural Information Processing Systems. PMLR, 2019.
  33. Boris T Polyak. Some methods of speeding up the convergence of iteration methods. USSR computational mathematics and mathematical physics, 4(5):1–17, 1964.
  34. Few-shot image recognition by predicting parameters from activations. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
  35. Faster R-CNN: towards real-time object detection with region proposal networks. In Conference on Neural Information Processing Systems. PMLR, 2015.
  36. A stochastic approximation method. The annals of mathematical statistics, 22(3):400–407, 1951.
  37. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  38. Rebecca J. Rosen. Armed with facebook ’likes’ alone, researchers can tell your race, gender, and sexual orientation. https://www.theatlantic.com/technology/archive/2013/03/armed-with-facebook-likes-alone-researchers-can-tell-your-race-gender-and-sexual-orientation, 2013.
  39. Learning representations by back-propagating errors. Nature, 323(6088):533–536, 1986.
  40. Meta-learning with latent embedding optimization. In International Conference on Learning Representations. OpenReview.net, 2019.
  41. Overfeat: Integrated recognition, localization and detection using convolutional networks. In International Conference on Learning Representations, 2014.
  42. Eric Siegel. The privacy pickle: Hewlett-packard’s prediction of employee behavior. https://www.predictiveanalyticsworld.com/machinelearningtimes/the-privacy-pickle-hewlett-packards-prediction-of-employee-behavior, 2013.
  43. Eric Siegel. When does predictive technology become unethical? https://hbr.org/2020/10/when-does-predictive-technology-become-unethical, 2020.
  44. Meta-transfer learning for few-shot learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
  45. On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning. JMLR.org, 2013.
  46. Show and tell: A neural image caption generator. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2015.
  47. Domain specified optimization for deployment authorization. In IEEE/CVF International Conference on Computer Vision, 2023.
  48. Model barrier: A compact un-transferable isolation domain for model intellectual property protection. In IEEE/CVF Conference on Computer Vision and Pattern, 2023.
  49. Non-transferable learning: A new approach for model ownership verification and applicability authorization. In International Conference on Learning Representations. OpenReview.net, 2022.
  50. Rhiannon Williams. Text-to-image AI models can be tricked into generating disturbing images. https://www.technologyreview.com/2023/11/17/1083593/text-to-image-ai-models-can-be-tricked-into-generating-disturbing-images, 2023.
  51. How transferable are features in deep neural networks? In Conference on Neural Information Processing Systems. PMLR, 2014.
  52. Metaformer baselines for vision. arXiv preprint arXiv:2210.13452, 2022.
  53. Matthew D. Zeiler. Adadelta: An adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.
  54. Unsupervised non-transferable text classification. In Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jiangyi Deng (7 papers)
  2. Shengyuan Pang (4 papers)
  3. Yanjiao Chen (16 papers)
  4. Liangming Xia (1 paper)
  5. Yijie Bai (3 papers)
  6. Haiqin Weng (10 papers)
  7. Wenyuan Xu (35 papers)
Citations (4)

Summary

Non-Fine-Tunable Learning to Restrain Task Transferability for Pre-trained Models

Introduction

Pre-trained models, widely utilized for their efficiency in adapting to new tasks via fine-tuning, often present risks of misuse in unethical or harmful applications. Existing protection mechanisms, such as non-transferable learning (NTL), are restricted to impairing model transferability prior to fine-tuning. This paper introduces non-fine-tunable learning, an enhanced framework designed to inhibit the fine-tuning of pre-trained models for predefined undesirable tasks, while maintaining their performance on intended tasks.

Methodology

The proposed learning paradigm comprises two primary objectives:

  1. Intactness: Preserving model efficiency on original tasks.
  2. Non-fine-tunability: Ensuring that fine-tuning the model for any restricted task is as difficult, if not more, than training a new model from scratch.

To achieve these goals, the authors propose a framework involving:

  • Fine-tuning Simulation: Simulating potential fine-tuning processes an adversary might use, which informs an optimization framework used to train the model.
  • Multi-objective Optimization Framework: Balancing model performance on legitimate tasks with resistance to fine-tuning on restricted tasks.

The implementation employs sophisticated loss functions, including inverse cross-entropy and KL divergence from a uniform distribution for classification, and a denial of service loss for generation tasks. These losses ensure the model performs poorly on restricted domain samples while not compromising its capabilities on original tasks.

Experimental Results

Extensive experiments tested the framework across two deep learning modalities (classification and generation), using six distinct model architectures. Results indicate:

  • Models protected by the framework exhibited significant resistance to fine-tuning on restricted tasks across various model architectures and fine-tuning strategies.
  • Fine-tuning such models requires effort comparable to or greater than training models from scratch, making misuse economically and technically unfeasible.

The robust evaluation also confirmed that the protected models maintain efficacy on their original tasks, affirming the objective of intactness.

Implications and Future Work

This framework introduces a novel approach to safe AI deployment, emphasizing the responsible use of AI technologies. While promising, the approach's efficiency against a broader array of domain adaptation techniques remains to be fully tested. Future research could explore its applicability to other domains, like audio or text, and develop more computationally efficient algorithms to enhance its practical feasibility.

Further investigations might also focus on extending the robustness of non-fine-tunable learning against evolving fine-tuning strategies and exploring how different initialization and optimization strategies affect the non-fine-tunability.

Conclusion

This paper presents an innovative method to enhance the ethical use of pre-trained models by limiting their adaptability to undesirable tasks, without undermining their utility for legitimate applications. The proposed method fosters further exploration into creating AI models that are not only powerful but also aligned with ethical standards and societal norms.

Youtube Logo Streamline Icon: https://streamlinehq.com