SOPHON: Non-Fine-Tunable Learning to Restrain Task Transferability For Pre-trained Models (2404.12699v1)

Published 19 Apr 2024 in cs.LG

Abstract: Instead of building deep learning models from scratch, developers are more and more relying on adapting pre-trained models to their customized tasks. However, powerful pre-trained models may be misused for unethical or illegal tasks, e.g., privacy inference and unsafe content generation. In this paper, we introduce a pioneering learning paradigm, non-fine-tunable learning, which prevents the pre-trained model from being fine-tuned to indecent tasks while preserving its performance on the original task. To fulfill this goal, we propose SOPHON, a protection framework that reinforces a given pre-trained model to be resistant to being fine-tuned in pre-defined restricted domains. Nonetheless, this is challenging due to a diversity of complicated fine-tuning strategies that may be adopted by adversaries. Inspired by model-agnostic meta-learning, we overcome this difficulty by designing sophisticated fine-tuning simulation and fine-tuning evaluation algorithms. In addition, we carefully design the optimization process to entrap the pre-trained model within a hard-to-escape local optimum regarding restricted domains. We have conducted extensive experiments on two deep learning modes (classification and generation), seven restricted domains, and six model architectures to verify the effectiveness of SOPHON. Experiment results verify that fine-tuning SOPHON-protected models incurs an overhead comparable to or even greater than training from scratch. Furthermore, we confirm the robustness of SOPHON to three fine-tuning methods, five optimizers, various learning rates and batch sizes. SOPHON may help boost further investigations into safe and responsible AI.

PDF HTML Abstract

Non-Fine-Tunable Learning to Restrain Task Transferability for Pre-trained Models

Introduction

Pre-trained models, widely utilized for their efficiency in adapting to new tasks via fine-tuning, often present risks of misuse in unethical or harmful applications. Existing protection mechanisms, such as non-transferable learning (NTL), are restricted to impairing model transferability prior to fine-tuning. This paper introduces non-fine-tunable learning, an enhanced framework designed to inhibit the fine-tuning of pre-trained models for predefined undesirable tasks, while maintaining their performance on intended tasks.

Methodology

The proposed learning paradigm comprises two primary objectives:

Intactness: Preserving model efficiency on original tasks.
Non-fine-tunability: Ensuring that fine-tuning the model for any restricted task is as difficult, if not more, than training a new model from scratch.

To achieve these goals, the authors propose a framework involving:

Fine-tuning Simulation: Simulating potential fine-tuning processes an adversary might use, which informs an optimization framework used to train the model.
Multi-objective Optimization Framework: Balancing model performance on legitimate tasks with resistance to fine-tuning on restricted tasks.

The implementation employs sophisticated loss functions, including inverse cross-entropy and KL divergence from a uniform distribution for classification, and a denial of service loss for generation tasks. These losses ensure the model performs poorly on restricted domain samples while not compromising its capabilities on original tasks.

Experimental Results

Extensive experiments tested the framework across two deep learning modalities (classification and generation), using six distinct model architectures. Results indicate:

Models protected by the framework exhibited significant resistance to fine-tuning on restricted tasks across various model architectures and fine-tuning strategies.
Fine-tuning such models requires effort comparable to or greater than training models from scratch, making misuse economically and technically unfeasible.

The robust evaluation also confirmed that the protected models maintain efficacy on their original tasks, affirming the objective of intactness.

Implications and Future Work

This framework introduces a novel approach to safe AI deployment, emphasizing the responsible use of AI technologies. While promising, the approach's efficiency against a broader array of domain adaptation techniques remains to be fully tested. Future research could explore its applicability to other domains, like audio or text, and develop more computationally efficient algorithms to enhance its practical feasibility.

Further investigations might also focus on extending the robustness of non-fine-tunable learning against evolving fine-tuning strategies and exploring how different initialization and optimization strategies affect the non-fine-tunability.

Conclusion

This paper presents an innovative method to enhance the ethical use of pre-trained models by limiting their adaptability to undesirable tasks, without undermining their utility for legitimate applications. The proposed method fosters further exploration into creating AI models that are not only powerful but also aligned with ethical standards and societal norms.