Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LMEraser: Large Model Unlearning through Adaptive Prompt Tuning (2404.11056v1)

Published 17 Apr 2024 in cs.LG, cs.AI, and cs.CR

Abstract: To address the growing demand for privacy protection in machine learning, we propose a novel and efficient machine unlearning approach for \textbf{L}arge \textbf{M}odels, called \textbf{LM}Eraser. Existing unlearning research suffers from entangled training data and complex model architectures, incurring extremely high computational costs for large models. LMEraser takes a divide-and-conquer strategy with a prompt tuning architecture to isolate data influence. The training dataset is partitioned into public and private datasets. Public data are used to train the backbone of the model. Private data are adaptively clustered based on their diversity, and each cluster is used to optimize a prompt separately. This adaptive prompt tuning mechanism reduces unlearning costs and maintains model performance. Experiments demonstrate that LMEraser achieves a $100$-fold reduction in unlearning costs without compromising accuracy compared to prior work. Our code is available at: \url{https://github.com/lmeraser/lmeraser}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. J. D. M.-W. C. Kenton and L. K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL-HLT, 2019, pp. 4171–4186.
  2. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020.
  3. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020.
  4. B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 3045–3059.
  5. N. Carlini, M. Jagielski, C. Zhang, N. Papernot, A. Terzis, and F. Tramer, “The privacy onion effect: Memorization is relative,” Advances in Neural Information Processing Systems, vol. 35, pp. 13 263–13 276, 2022.
  6. Y. Liu, L. Xu, X. Yuan, C. Wang, and B. Li, “The right to be forgotten in federated learning: An efficient realization with rapid retraining,” in IEEE INFOCOM 2022-IEEE Conference on Computer Communications.   IEEE, 2022, pp. 1749–1758.
  7. S. o. C. D. o. J. Office of the Attorney General, “California Consumer Privacy Act (CCPA),” https://oag.ca.gov/privacy/ccpa, 2023.
  8. H. Xu, T. Zhu, L. Zhang, W. Zhou, and P. S. Yu, “Machine unlearning: A survey,” ACM Computing Surveys, vol. 56, no. 1, pp. 1–36, 2023.
  9. N. Su and B. Li, “Asynchronous federated unlearning,” in IEEE INFOCOM 2023-IEEE Conference on Computer Communications.   IEEE, 2023, pp. 1–10.
  10. L. Bourtoule, V. Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot, “Machine unlearning,” in 2021 IEEE Symposium on Security and Privacy (SP).   IEEE, 2021, pp. 141–159.
  11. J. Brophy and D. Lowd, “Machine Unlearning for Random Forests,” in International Conference on Machine Learning, sep 2020, pp. 1092–1104. [Online]. Available: https://proceedings.mlr.press/v139/brophy21a.html
  12. C. Guo, T. Goldstein, A. Hannun, and L. Van Der Maaten, “Certified data removal from machine learning models,” in International Conference on Machine Learning.   PMLR, 2020, pp. 3832–3842.
  13. V. Suriyakumar and A. C. Wilson, “Algorithms that approximate data removal: New results and limitations,” Advances in Neural Information Processing Systems, vol. 35, pp. 18 892–18 903, 2022.
  14. G. Wu, M. Hashemi, and C. Srinivasa, “Puma: Performance unchanged model augmentation for training data removal,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 8, 2022, pp. 8675–8682.
  15. Q. P. Nguyen, B. K. H. Low, and P. Jaillet, “Variational bayesian unlearning,” Advances in Neural Information Processing Systems, vol. 33, pp. 16 025–16 036, 2020.
  16. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.   IEEE, 2009, pp. 248–255.
  17. “Wikipedia,” https://www.wikipedia.org/, 2024, accessed: Jan. 2024.
  18. P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Computing Surveys, vol. 55, no. 9, pp. 1–35, 2023.
  19. Q. Huang, X. Dong, D. Chen, W. Zhang, F. Wang, G. Hua, and N. Yu, “Diversity-aware meta visual prompting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10 878–10 887.
  20. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2020.
  21. M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim, “Visual prompt tuning,” in European Conference on Computer Vision.   Springer, 2022, pp. 709–727.
  22. H. Bahng, A. Jahanian, S. Sankaranarayanan, and P. Isola, “Exploring visual prompts for adapting large-scale models,” arXiv preprint arXiv:2203.17274, 2022.
  23. B. Bowman, A. Achille, L. Zancato, M. Trager, P. Perera, G. Paolini, and S. Soatto, “À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 14 984–14 993.
  24. Y. Cao and J. Yang, “Towards making systems forget with machine unlearning,” in 2015 IEEE Symposium on Security and Privacy (SP).   IEEE, 2015, pp. 463–480.
  25. J. Xu, Z. Wu, C. Wang, and X. Jia, “Machine unlearning: Solutions and challenges,” arXiv preprint arXiv:2308.07061, 2023.
  26. P. W. Koh and P. Liang, “Understanding black-box predictions via influence functions,” in International Conference on Machine Learning.   PMLR, 2017, pp. 1885–1894.
  27. A. Sekhari, J. Acharya, G. Kamath, and A. T. Suresh, “Remember what you want to forget: Algorithms for machine unlearning,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 075–18 086, 2021.
  28. A. Warnecke, L. Pirch, C. Wressnegger, and K. Rieck, “Machine unlearning of features and labels,” in 30th Annual Network and Distributed System Security Symposium, NDSS 2023, San Diego, California, USA, February 27 - March 3, 2023.   The Internet Society, 2023. [Online]. Available: https://www.ndss-symposium.org/ndss-paper/machine-unlearning-of-features-and-labels/
  29. J. Wu, Y. Yang, Y. Qian, Y. Sui, X. Wang, and X. He, “Gif: A general graph unlearning strategy via influence function,” in Proceedings of the ACM Web Conference 2023, 2023, pp. 651–661.
  30. A. Golatkar, A. Achille, and S. Soatto, “Eternal sunshine of the spotless net: Selective forgetting in deep networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9304–9312.
  31. ——, “Forgetting outside the box: Scrubbing deep networks of information accessible from input-output observations,” in European Conference on Computer Vision.   Springer, 2020, pp. 383–398.
  32. A. Golatkar, A. Achille, A. Ravichandran, M. Polito, and S. Soatto, “Mixed-privacy forgetting in deep networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 792–801.
  33. J. Wang, S. Guo, X. Xie, and H. Qi, “Federated unlearning via class-discriminative pruning,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 622–632.
  34. A. Thudi, H. Jia, I. Shumailov, and N. Papernot, “On the necessity of auditable algorithmic definitions for machine unlearning,” in 31st USENIX Security Symposium (USENIX Security 22), 2022, pp. 4007–4022.
  35. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  36. T. Shibata, G. Irie, D. Ikami, and Y. Mitsuzumi, “Learning with Selective Forgetting,” in IJCAI International Joint Conference on Artificial Intelligence, 2021, pp. 989–996.
  37. R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.
  38. K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, “Domain generalization: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  39. D. Müllner, “Modern hierarchical, agglomerative clustering algorithms,” arXiv preprint arXiv:1109.2378, 2011.
  40. A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Technical report, University of Toronto, 2009.
  41. J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition,” Neural networks, vol. 32, pp. 323–332, 2012.
  42. Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” in Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning, vol. 2011, no. 2, 2011.
  43. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022.
  44. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.

Summary

  • The paper introduces adaptive prompt tuning to efficiently unlearn sensitive data from large models, reducing unlearning costs and preserving accuracy.
  • It partitions training data into public and private sets and applies targeted clustering and prompt tuning to manage sensitive information.
  • Experimental results on models like BERT and GPT-3 demonstrate a 100-fold decrease in computational resources while maintaining high performance.

LMEraser: Adaptive Prompt Tuning for Efficient Large Model Unlearning

Introduction to LMEraser

LMEraser is presented as an innovative solution to machine unlearning for large models, utilizing a divide-and-conquer approach through adaptive prompt tuning. It efficiently addresses the significant computational costs associated with traditional unlearning methods by partitioning training data into public and private datasets. The private data, more sensitive in nature, are adaptively clustered and prompt tuned separately, allowing for precise and efficient data unlearning.

Overview of Machine Unlearning Challenges

Large models, integral to various applications due to their high accuracy and adaptability, pose daunting challenges in ensuring data privacy, especially with regulations such as GDPR and CCPA which necessitate the 'right to be forgotten'. These challenges include:

  • Identifying specific data influence within large models
  • Executing computationally expensive unlearning processes
  • Maintaining model stability and performance post unlearning

Architecture of LMEraser

Data Partitioning and Pre-training: The model segregates training data into public and private datasets. The public dataset is used to train the backbone of the model, while the private data—subject to unlearning requests—are utilized for prompt tuning.

Adaptive Prompt Tuning Strategy:

  • Private Data Clustering: Based on diversity, private data are grouped into clusters, guided by their features extracted using the pre-trained backbone. This clustering allows for targeted learning and precise unlearning.
  • Prompt and Classifier Head Tuning: For each data cluster, unique prompt parameters and classifier heads are optimized to boost the model's performance, focusing on the specific features of the cluster.

Key Features and Contributions

LMEraser is characterized by several innovative features:

  1. Utilizes a prompt tuning architecture that separates the influence of sensitive private data from the backbone trained on public data.
  2. Introduces an adaptive mechanism for private data clustering and tailored prompt creation, effectively balancing unlearning costs against model performance.
  3. Significantly reduces unlearning costs, showing a 100-fold decrease in computational resources needed compared to previous methodologies, while simultaneously preserving high model accuracy.

Experimental Evaluation

LMEraser's approach was tested with large models like BERT and GPT-3, focusing on image classification tasks. The model's performance was evaluated against its ability to efficiently remove data while maintaining its utility:

  • The experiments confirmed that adaptive prompt tuning enables the model to efficiently handle unlearning requests by retraining only the relevant prompts and classifier heads.
  • Performance metrics indicated that LMEraser effectively manages to maintain high accuracy levels, demonstrating only minor reductions even when large portions of private data are unlearned.

Potential Implications

Theoretical: LMEraser advances the field of machine unlearning by providing a scalable, efficient framework adaptable to various large models, potentially influencing future research and methodologies in data privacy and model adaptability.

Practical: In practical applications, the ability to efficiently unlearn data without retraining entire models offers significant computational savings and aligns with privacy regulations, making large models more feasible and ethical in data-sensitive areas.

Future Directions

Speculating on future enhancements, LMEraser's architecture could be refined to automate the clustering process further and optimize the thresholds for data diversity. Advanced versions might incorporate real-time learning and unlearning capabilities, further reducing the turnaround time for unlearning requests. Additionally, exploring the extension of this architecture to other types of large models or different data modalities could broaden its applicability and impact in the AI domain.

In conclusion, LMEraser sets a new standard for machine unlearning in large models by integrating adaptive prompt tuning into its architecture, offering a practical solution to the challenges posed by data privacy regulations and the computational demands of traditional unlearning methods.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets