Knowledge Unlearning for LLMs: Tasks, Methods, and Challenges (2311.15766v2)
Abstract: In recent years, LLMs have spurred a new research paradigm in natural language processing. Despite their excellent capability in knowledge-based question answering and reasoning, their potential to retain faulty or even harmful knowledge poses risks of malicious application. The challenge of mitigating this issue and transforming these models into purer assistants is crucial for their widespread applicability. Unfortunately, Retraining LLMs repeatedly to eliminate undesirable knowledge is impractical due to their immense parameters. Knowledge unlearning, derived from analogous studies on machine unlearning, presents a promising avenue to address this concern and is notably advantageous in the context of LLMs. It allows for the removal of harmful knowledge in an efficient manner, without affecting unrelated knowledge in the model. To this end, we provide a survey of knowledge unlearning in the era of LLMs. Firstly, we formally define the knowledge unlearning problem and distinguish it from related works. Subsequently, we categorize existing knowledge unlearning methods into three classes: those based on parameter optimization, parameter merging, and in-context learning, and introduce details of these unlearning methods. We further present evaluation datasets used in existing methods, and finally conclude this survey by presenting the ongoing challenges and future directions.
- Machine unlearning, 2020.
- Machine unlearning: A survey. ACM Comput. Surv., 56(1), aug 2023.
- Language models are few-shot learners, 2020.
- Llama: Open and efficient foundation language models, 2023.
- Extracting training data from large language models, 2021.
- Quantifying memorization across neural language models, 2023.
- Arcane: An efficient architecture for exact machine unlearning. In Lud De Raedt, editor, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 4006–4013. International Joint Conferences on Artificial Intelligence Organization, 7 2022. Main Track.
- On the necessity of auditable algorithmic definitions for machine unlearning, 2022.
- Differentially private fine-tuning of language models, 2022.
- Large language models can be strong differentially private learners, 2022.
- Remember what you want to forget: Algorithms for machine unlearning, 2021.
- Unlearn what you want to forget: Efficient unlearning for llms, 2023.
- Fine-tuning language models from human preferences, 2020.
- Rrhf: Rank responses to align language models with human feedback without tears, 2023.
- Editing factual knowledge in language models, 2021.
- Fast model editing at scale, 2022.
- Mass-editing memory in a transformer, 2023.
- Kga: A general machine unlearning framework based on knowledge gap alignment, 2023.
- Knowledge unlearning for mitigating privacy risks in language models, 2022.
- Large language model unlearning, 2023.
- Depn: Detecting and editing privacy neurons in pretrained language models, 2023.
- Who’s harry potter? approximate unlearning in llms, 2023.
- Editing models with task arithmetic, 2023.
- Composing parameter-efficient modules with arithmetic operations, 2023.
- In-context unlearning: Language models as few shot unlearners, 2023.
- Axiomatic attribution for deep networks, 2017.
- Lora: Low-rank adaptation of large language models, 2021.
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning, 2022.
- Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
- OpenAI. Gpt-4 technical report, 2023.
- Llama 2: Open foundation and fine-tuned chat models, 2023.
- Learning transferable visual models from natural language supervision, 2021.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models, 2023.
- OpenAI. Chatgpt: Optimizing language models for dialogue, 2022. urlhttps://online-chatgpt.com/.
- Baichuan 2: Open large-scale language models, 2023.
- Nianwen Si (5 papers)
- Hao Zhang (947 papers)
- Heyu Chang (1 paper)
- Wenlin Zhang (11 papers)
- Dan Qu (7 papers)
- Weiqiang Zhang (6 papers)