Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA (2312.11795v1)

Published 19 Dec 2023 in cs.CL

Abstract: LLMs have shown great success in various NLP tasks, whist they still need updates after deployment to fix errors or keep pace with the changing knowledge in the world. Researchers formulate such problem as Model Editing and have developed various editors focusing on different axes of editing properties. However, current editors can hardly support all properties and rely on heavy computational resources. In this paper, we propose a plug-in Model Editing method based on neuron-indexed dynamic LoRA (MELO), which alters the behavior of LLMs by dynamically activating certain LoRA blocks according to the index built in an inner vector database. Our method satisfies various editing properties with high efficiency and can be easily integrated into multiple LLM backbones. Experimental results show that our proposed MELO achieves state-of-the-art editing performance on three sequential editing tasks (document classification, question answering and hallucination correction), while requires the least trainable parameters and computational cost.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 1–9. Dublin, Ireland: Association for Computational Linguistics.
  2. An estimate of an upper bound for the entropy of English. Computational Linguistics, 18(1): 31–40.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
  4. FairLex: A multilingual benchmark for evaluating fairness in legal text processing. arXiv preprint arXiv:2203.07228.
  5. Calibrating Factual Knowledge in Pretrained Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2022, 5937–5947. Association for Computational Linguistics.
  6. Transformer Feed-Forward Layers Are Key-Value Memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 5484–5495. Association for Computational Linguistics.
  7. Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors. arXiv preprint arXiv:2211.11031.
  8. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, 2790–2799. PMLR.
  9. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  10. Transformer-Patcher: One Mistake worth One Neuron. arXiv preprint arXiv:2301.09785.
  11. Visual Prompt Tuning. In European Conference on Computer Vision (ECCV).
  12. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7: 453–466.
  13. Zero-Shot Relation Extraction via Reading Comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), 333–342. Vancouver, Canada: Association for Computational Linguistics.
  14. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 4582–4597. Online: Association for Computational Linguistics.
  15. On Continual Model Refinement in Out-of-Distribution Data Streams. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022).
  16. Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models. arXiv preprint arXiv:2305.18703.
  17. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896.
  18. Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems, 35: 17359–17372.
  19. Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229.
  20. Fast Model Editing at Scale. In International Conference on Learning Representations.
  21. Memory-based model editing at scale. In International Conference on Machine Learning, 15817–15831. PMLR.
  22. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332.
  23. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  24. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  25. DyLoRA: Parameter-Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 3274–3287. Dubrovnik, Croatia: Association for Computational Linguistics.
  26. Spot: Better frozen model adaptation through soft prompt transfer. arXiv preprint arXiv:2110.07904.
  27. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35: 24824–24837.
  28. Editing Large Language Models: Problems, Methods, and Opportunities. arXiv preprint arXiv:2305.13172.
  29. Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512.
Citations (34)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com