Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Advancing Parameter Efficiency in Fine-tuning via Representation Editing (2402.15179v3)

Published 23 Feb 2024 in cs.LG and cs.CL

Abstract: Parameter Efficient Fine-Tuning (PEFT) techniques have drawn significant attention due to their ability to yield competitive results while updating only a small portion of the adjustable parameters. However, existing PEFT methods pose challenges in hyperparameter selection, such as choosing the rank for LoRA or Adapter, or specifying the length of soft prompts. To address these challenges, we propose a novel fine-tuning approach for neural models, named Representation EDiting (RED), which modifies the representations generated at some layers through the application of scaling and biasing operations. While existing PEFT methods still demonstrate over-parameterization that could potentially undermine the generalization ability acquired from pre-training, RED can substantially reduce the number of trainable parameters by a factor of 25, 700 compared to full parameter fine-tuning and by a factor of 32 relative to LoRA. Remarkably, RED achieves results comparable or superior to both full parameter fine-tuning and other PEFT methods. Extensive experiments across various model architectures and scales, including RoBERTa, GPT-2, T5, and LLaMA-2, have demonstrated the effectiveness and efficiency of RED1, thereby positioning it as a promising PEFT strategy for large-scale neural models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255.
  2. Attempt: Parameter-efficient multi-task tuning via attentional mixtures of soft prompts. In Conference on Empirical Methods in Natural Language Processing.
  3. Satanjeev Banerjee and Alon Lavie. 2005. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In IEEvaluation@ACL.
  4. The second pascal recognising textual entailment challenge.
  5. Open llm leaderboard. https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard.
  6. Anja Belz and Ehud Reiter. 2006. Comparing automatic and human evaluation of nlg systems. In Conference of the European Chapter of the Association for Computational Linguistics.
  7. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. ArXiv, abs/2106.10199.
  8. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  9. Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In International Workshop on Semantic Evaluation.
  10. Ultrafeedback: Boosting language models with high-quality feedback. ArXiv, abs/2310.01377.
  11. Transforming question answering datasets into natural language inference datasets. ArXiv, abs/1809.02922.
  12. Qlora: Efficient finetuning of quantized llms. ArXiv, abs/2305.14314.
  13. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  14. Sparse low-rank adaptation of pre-trained language models. arXiv preprint arXiv:2311.11696.
  15. Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models. arXiv preprint arXiv:2203.06904.
  16. William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In International Joint Conference on Natural Language Processing.
  17. A framework for few-shot language model evaluation.
  18. Parameter-efficient transfer learning with diff pruning. arXiv preprint arXiv:2012.07463.
  19. Towards a unified view of parameter-efficient transfer learning. ArXiv, abs/2110.04366.
  20. Measuring massive multitask language understanding. ArXiv, abs/2009.03300.
  21. Measuring mathematical problem solving with the math dataset. ArXiv, abs/2103.03874.
  22. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pages 2790–2799. PMLR.
  23. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685.
  24. Compacter: Efficient low-rank hypercomplex adapter layers. Advances in Neural Information Processing Systems, 34:1022–1035.
  25. Vera: Vector-based random matrix adaptation. ArXiv, abs/2310.11454.
  26. What would elsa do? freezing layers during transformer fine-tuning. ArXiv, abs/1911.03090.
  27. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
  28. The power of scale for parameter-efficient prompt tuning. In Conference on Empirical Methods in Natural Language Processing.
  29. Datasets: A community library for natural language processing. ArXiv, abs/2109.02846.
  30. Xiang Lisa Li and Percy Liang. 2021a. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190.
  31. Xiang Lisa Li and Percy Liang. 2021b. Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), abs/2101.00190.
  32. Alpacaeval: An automatic evaluator of instruction-following models.
  33. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Annual Meeting of the Association for Computational Linguistics.
  34. Truthfulqa: Measuring how models mimic human falsehoods. In Annual Meeting of the Association for Computational Linguistics.
  35. Exploring versatile generative language model via parameter-efficient transfer learning. In Findings.
  36. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. ArXiv, abs/2205.05638.
  37. Aligning large language models with human preferences through representation engineering.
  38. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692.
  39. Unipelt: A unified framework for parameter-efficient language model tuning. In Annual Meeting of the Association for Computational Linguistics.
  40. Can a suit of armor conduct electricity? a new dataset for open book question answering. In Conference on Empirical Methods in Natural Language Processing.
  41. The e2e dataset: New challenges for end-to-end generation. ArXiv, abs/1706.09254.
  42. OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
  43. Bleu: a method for automatic evaluation of machine translation. In Annual Meeting of the Association for Computational Linguistics.
  44. Adapterfusion: Non-destructive task composition for transfer learning. ArXiv, abs/2005.00247.
  45. Improving language understanding by generative pre-training.
  46. Language models are unsupervised multitask learners.
  47. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  48. Adapterdrop: On the efficiency of adapters in transformers. In Conference on Empirical Methods in Natural Language Processing.
  49. An adversarial winograd schema challenge at scale.
  50. Eliciting knowledge from language models using automatically generated prompts. ArXiv, abs/2010.15980.
  51. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing.
  52. Asa Cooper Stickland and Iain Murray. 2019. Bert and pals: Projected attention layers for efficient adaptation in multi-task learning. In International Conference on Machine Learning, pages 5986–5995. PMLR.
  53. Extracting latent steering vectors from pretrained language models. ArXiv, abs/2205.05124.
  54. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288.
  55. Activation addition: Steering language models without optimization. ArXiv, abs/2308.10248.
  56. Attention is all you need. In Neural Information Processing Systems.
  57. Cider: Consensus-based image description evaluation. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4566–4575.
  58. Glue: A multi-task benchmark and analysis platform for natural language understanding. In BlackboxNLP@EMNLP.
  59. Multitask prompt tuning enables parameter-efficient transfer learning. ArXiv, abs/2303.02861.
  60. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7:625–641.
  61. A broad-coverage challenge corpus for sentence understanding through inference. In North American Chapter of the Association for Computational Linguistics.
  62. Huggingface’s transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771.
  63. Parameter efficient multi-task fine-tuning by learning to transfer token-wise prompts. In Conference on Empirical Methods in Natural Language Processing.
  64. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199.
  65. Hellaswag: Can a machine really finish your sentence? In Annual Meeting of the Association for Computational Linguistics.
  66. Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512.
  67. Adaptive budget allocation for parameter-efficient fine-tuning. ArXiv, abs/2303.10512.
  68. Masking as an efficient alternative to finetuning for pretrained language models. In Conference on Empirical Methods in Natural Language Processing.
  69. Judging llm-as-a-judge with mt-bench and chatbot arena. ArXiv, abs/2306.05685.
  70. Representation engineering: A top-down approach to ai transparency. ArXiv, abs/2310.01405.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Muling Wu (13 papers)
  2. Wenhao Liu (83 papers)
  3. Xiaohua Wang (26 papers)
  4. Tianlong Li (13 papers)
  5. Changze Lv (22 papers)
  6. Zixuan Ling (8 papers)
  7. Jianhao Zhu (4 papers)
  8. Cenyuan Zhang (10 papers)
  9. Xiaoqing Zheng (44 papers)
  10. Xuanjing Huang (287 papers)
Citations (15)