LMEraser: Large Model Unlearning through Adaptive Prompt Tuning (2404.11056v1)

Published 17 Apr 2024 in cs.LG, cs.AI, and cs.CR

Abstract: To address the growing demand for privacy protection in machine learning, we propose a novel and efficient machine unlearning approach for \textbf{L}arge \textbf{M}odels, called \textbf{LM}Eraser. Existing unlearning research suffers from entangled training data and complex model architectures, incurring extremely high computational costs for large models. LMEraser takes a divide-and-conquer strategy with a prompt tuning architecture to isolate data influence. The training dataset is partitioned into public and private datasets. Public data are used to train the backbone of the model. Private data are adaptively clustered based on their diversity, and each cluster is used to optimize a prompt separately. This adaptive prompt tuning mechanism reduces unlearning costs and maintains model performance. Experiments demonstrate that LMEraser achieves a $100$-fold reduction in unlearning costs without compromising accuracy compared to prior work. Our code is available at: \url{https://github.com/lmeraser/lmeraser}.

References (44)

Summary

The paper introduces adaptive prompt tuning to efficiently unlearn sensitive data from large models, reducing unlearning costs and preserving accuracy.
It partitions training data into public and private sets and applies targeted clustering and prompt tuning to manage sensitive information.
Experimental results on models like BERT and GPT-3 demonstrate a 100-fold decrease in computational resources while maintaining high performance.

LMEraser: Adaptive Prompt Tuning for Efficient Large Model Unlearning

Introduction to LMEraser

LMEraser is presented as an innovative solution to machine unlearning for large models, utilizing a divide-and-conquer approach through adaptive prompt tuning. It efficiently addresses the significant computational costs associated with traditional unlearning methods by partitioning training data into public and private datasets. The private data, more sensitive in nature, are adaptively clustered and prompt tuned separately, allowing for precise and efficient data unlearning.

Overview of Machine Unlearning Challenges

Large models, integral to various applications due to their high accuracy and adaptability, pose daunting challenges in ensuring data privacy, especially with regulations such as GDPR and CCPA which necessitate the 'right to be forgotten'. These challenges include:

Identifying specific data influence within large models
Executing computationally expensive unlearning processes
Maintaining model stability and performance post unlearning

Architecture of LMEraser

Data Partitioning and Pre-training: The model segregates training data into public and private datasets. The public dataset is used to train the backbone of the model, while the private data—subject to unlearning requests—are utilized for prompt tuning.

Adaptive Prompt Tuning Strategy:

Private Data Clustering: Based on diversity, private data are grouped into clusters, guided by their features extracted using the pre-trained backbone. This clustering allows for targeted learning and precise unlearning.
Prompt and Classifier Head Tuning: For each data cluster, unique prompt parameters and classifier heads are optimized to boost the model's performance, focusing on the specific features of the cluster.

Key Features and Contributions

LMEraser is characterized by several innovative features:

Utilizes a prompt tuning architecture that separates the influence of sensitive private data from the backbone trained on public data.
Introduces an adaptive mechanism for private data clustering and tailored prompt creation, effectively balancing unlearning costs against model performance.
Significantly reduces unlearning costs, showing a 100-fold decrease in computational resources needed compared to previous methodologies, while simultaneously preserving high model accuracy.

Experimental Evaluation

LMEraser's approach was tested with large models like BERT and GPT-3, focusing on image classification tasks. The model's performance was evaluated against its ability to efficiently remove data while maintaining its utility:

The experiments confirmed that adaptive prompt tuning enables the model to efficiently handle unlearning requests by retraining only the relevant prompts and classifier heads.
Performance metrics indicated that LMEraser effectively manages to maintain high accuracy levels, demonstrating only minor reductions even when large portions of private data are unlearned.

Potential Implications

Theoretical: LMEraser advances the field of machine unlearning by providing a scalable, efficient framework adaptable to various large models, potentially influencing future research and methodologies in data privacy and model adaptability.

Practical: In practical applications, the ability to efficiently unlearn data without retraining entire models offers significant computational savings and aligns with privacy regulations, making large models more feasible and ethical in data-sensitive areas.

Future Directions

Speculating on future enhancements, LMEraser's architecture could be refined to automate the clustering process further and optimize the thresholds for data diversity. Advanced versions might incorporate real-time learning and unlearning capabilities, further reducing the turnaround time for unlearning requests. Additionally, exploring the extension of this architecture to other types of large models or different data modalities could broaden its applicability and impact in the AI domain.

In conclusion, LMEraser sets a new standard for machine unlearning in large models by integrating adaptive prompt tuning into its architecture, offering a practical solution to the challenges posed by data privacy regulations and the computational demands of traditional unlearning methods.

PDF Markdown