Papers
Topics
Authors
Recent
2000 character limit reached

Dissecting Language Models: Machine Unlearning via Selective Pruning (2403.01267v2)

Published 2 Mar 2024 in cs.LG and cs.CL

Abstract: Understanding and shaping the behaviour of LLMs is increasingly important as applications become more powerful and more frequently adopted. This paper introduces a machine unlearning method specifically designed for LLMs. We introduce a selective pruning method for LLMs that removes neurons based on their relative importance on a targeted capability compared to overall network performance. This approach is a compute- and data-efficient method for identifying and removing neurons that enable specific behaviours. Our findings reveal that both feed-forward and attention neurons in LLMs are specialized; that is, for specific tasks, certain neurons are more crucial than others. Code from all experiments is available at https://github.com/nickypro/selective-pruning

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Eliciting latent predictions from transformers with the tuned lens. arXiv preprint arXiv:2303.08112, 2023.
  2. On the dangers of stochastic parrots: Can language models be too big? In Madeleine Clare Elish, William Isaac, and Richard S. Zemel (eds.), FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3-10, 2021, pp.  610–623. ACM, 2021. doi: 10.1145/3442188.3445922. URL https://doi.org/10.1145/3442188.3445922.
  3. Pythia: A suite for analyzing large language models across training and scaling. arXiv preprint arXiv:2304.01373, 2023.
  4. What is the state of neural network pruning? In Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (eds.), Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020. mlsys.org, 2020. URL https://proceedings.mlsys.org/book/296.pdf.
  5. Nuanced metrics for measuring unintended bias with real data for text classification, 2019.
  6. Machine unlearning. In 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021, pp.  141–159. IEEE, 2021. doi: 10.1109/SP40001.2021.00019. URL https://doi.org/10.1109/SP40001.2021.00019.
  7. Causal scrubbing: a method for rigorously testing interpretability hypotheses. AI Alignment Forum, 2022. URL https://www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN.
  8. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  9. Can bad teaching induce forgetting? unlearning in deep networks using an incompetent teacher. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  7210–7217, 2023a.
  10. Zero-shot machine unlearning. IEEE Trans. Inf. Forensics Secur., 18:2345–2354, 2023b. doi: 10.1109/TIFS.2023.3265506. URL https://doi.org/10.1109/TIFS.2023.3265506.
  11. Towards automated circuit discovery for mechanistic interpretability. CoRR, abs/2304.14997, 2023. doi: 10.48550/arXiv.2304.14997. URL https://doi.org/10.48550/arXiv.2304.14997.
  12. Jailbreaker: Automated jailbreak across multiple large language model chatbots. CoRR, abs/2307.08715, 2023. doi: 10.48550/arXiv.2307.08715. URL https://doi.org/10.48550/arXiv.2307.08715.
  13. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
  14. Fast machine unlearning without retraining through selective synaptic dampening. CoRR, abs/2308.07707, 2023. doi: 10.48550/ARXIV.2308.07707. URL https://doi.org/10.48550/arXiv.2308.07707.
  15. Optimal brain compression: A framework for accurate post-training quantization and pruning. arXiv preprint arXiv:2208.11580, 2022.
  16. Unified concept editing in diffusion models. CoRR, abs/2308.14761, 2023. doi: 10.48550/arXiv.2308.14761. URL https://doi.org/10.48550/arXiv.2308.14761.
  17. The Pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
  18. Transformer feed-forward layers are key-value memories. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pp.  5484–5495. Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.emnlp-main.446. URL https://doi.org/10.18653/v1/2021.emnlp-main.446.
  19. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp.  9301–9309. Computer Vision Foundation / IEEE, 2020. doi: 10.1109/CVPR42600.2020.00932.
  20. Amnesiac machine learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  11516–11524, 2021.
  21. Detoxify, 2020. URL https://github.com/unitaryai/detoxify.
  22. Optimal brain surgeon and general network pruning. In IEEE International Conference on Neural Networks, pp.  293–299 vol.1, 1993. doi: 10.1109/ICNN.1993.298572.
  23. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
  24. Editing models with task arithmetic. CoRR, abs/2212.04089, 2022. doi: 10.48550/arXiv.2212.04089. URL https://doi.org/10.48550/arXiv.2212.04089.
  25. Model sparsity can simplify machine unlearning, 2023.
  26. Alex Krizhevsky. Learning multiple layers of features from tiny images, 2009.
  27. Towards unbounded machine unlearning. CoRR, abs/2302.09880, 2023. doi: 10.48550/ARXIV.2302.09880. URL https://doi.org/10.48550/arXiv.2302.09880.
  28. Datasets: A community library for natural language processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.  175–184, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. URL https://aclanthology.org/2021.emnlp-demo.21.
  29. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692, 2019. URL http://arxiv.org/abs/1907.11692.
  30. Deja vu: Contextual sparsity for efficient llms at inference time. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp.  22137–22176. PMLR, 2023. URL https://proceedings.mlr.press/v202/liu23am.html.
  31. Learn to forget: Machine unlearning via neuron masking. IEEE Transactions on Dependable and Secure Computing, 2022.
  32. The hydra effect: Emergent self-repair in language model computations. CoRR, abs/2307.15771, 2023. doi: 10.48550/arXiv.2307.15771. URL https://doi.org/10.48550/arXiv.2307.15771.
  33. Are sixteen heads really better than one? In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (eds.), Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp.  14014–14024, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/2c601ad9d2ff9bc8b282670cdd54f69f-Abstract.html.
  34. Markov chain monte carlo-based machine unlearning: Unlearning what needs to be forgotten. In Yuji Suga, Kouichi Sakurai, Xuhua Ding, and Kazue Sako (eds.), ASIA CCS ’22: ACM Asia Conference on Computer and Communications Security, Nagasaki, Japan, 30 May 2022 - 3 June 2022, pp.  351–363. ACM, 2022a. doi: 10.1145/3488932.3517406. URL https://doi.org/10.1145/3488932.3517406.
  35. A survey of machine unlearning. CoRR, abs/2209.02299, 2022b. doi: 10.48550/arXiv.2209.02299. URL https://doi.org/10.48550/arXiv.2209.02299.
  36. Edoardo Pona. Superposition and dropout, 2023. URL https://www.lesswrong.com/posts/znShPqe9RdtB6AeFr/superposition-and-dropout.
  37. Better language models and their implications. OpenAI Blog https://openai. com/blog/better-language-models, 1(2), 2019.
  38. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
  39. Fast yet effective machine unlearning. CoRR, abs/2111.08947, 2021. URL https://arxiv.org/abs/2111.08947.
  40. Galactica: A large language model for science. CoRR, abs/2211.09085, 2022. doi: 10.48550/arXiv.2211.09085. URL https://doi.org/10.48550/arXiv.2211.09085.
  41. Unrolling SGD: understanding factors influencing machine unlearning. In 7th IEEE European Symposium on Security and Privacy, EuroS&P 2022, Genoa, Italy, June 6-10, 2022, pp.  303–319. IEEE, 2022. doi: 10.1109/EUROSP53844.2022.00027. URL https://doi.org/10.1109/EuroSP53844.2022.00027.
  42. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  43. Natural language processing with transformers. " O’Reilly Media, Inc.", 2022.
  44. Activation addition: Steering language models without optimization. CoRR, abs/2308.10248, 2023. doi: 10.48550/arXiv.2308.10248. URL https://doi.org/10.48550/arXiv.2308.10248.
  45. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. In Anna Korhonen, David R. Traum, and Lluís Màrquez (eds.), Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pp.  5797–5808. Association for Computational Linguistics, 2019. doi: 10.18653/v1/p19-1580. URL https://doi.org/10.18653/v1/p19-1580.
  46. OPT: open pre-trained transformer language models. CoRR, abs/2205.01068, 2022a. doi: 10.48550/arXiv.2205.01068. URL https://doi.org/10.48550/arXiv.2205.01068.
  47. Moefication: Transformer feed-forward layers are mixtures of experts. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, pp.  877–890. Association for Computational Linguistics, 2022b. doi: 10.18653/v1/2022.findings-acl.71. URL https://doi.org/10.18653/v1/2022.findings-acl.71.
  48. Prompt certified machine unlearning with randomized gradient smoothing and quantization. In NeurIPS, 2022c. URL http://papers.nips.cc/paper_files/paper/2022/hash/5771d9f214b75be6ff20f63bba315644-Abstract-Conference.html.
Citations (9)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.