Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Software Engineering Tasks (2312.15614v1)

Published 25 Dec 2023 in cs.SE, cs.AI, and cs.CL

Abstract: Pre-trained models (PTMs) have achieved great success in various Software Engineering (SE) downstream tasks following the ``pre-train then fine-tune'' paradigm. As fully fine-tuning all parameters of PTMs can be computationally expensive, a widely used solution is parameter-efficient fine-tuning (PEFT), which freezes PTMs while introducing extra parameters. Though work has been done to test PEFT methods in the SE field, a comprehensive evaluation is still lacking. This paper aims to fill in this gap by evaluating the effectiveness of five PEFT methods on eight PTMs and four SE downstream tasks. For different tasks and PEFT methods, we seek answers to the following research questions: 1) Is it more effective to use PTMs trained specifically on source code, or is it sufficient to use PTMs trained on natural language text? 2) What is the impact of varying model sizes? 3) How does the model architecture affect the performance? Besides effectiveness, we also discuss the efficiency of PEFT methods, concerning the costs of required training time and GPU resource consumption. We hope that our findings can provide a deeper understanding of PEFT methods on various PTMs and SE downstream tasks. All the codes and data are available at \url{https://github.com/zwtnju/PEFT.git}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (95)
  1. A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3053–3070.
  2. Unified pre-training for program understanding and generation. arXiv preprint arXiv:2103.06333 (2021).
  3. Toufique Ahmed and Premkumar Devanbu. 2022. Multilingual training for software engineering. In Proceedings of the 44th International Conference on Software Engineering. 1443–1455.
  4. Ethical Aspects of ChatGPT in Software Engineering Research. arXiv preprint arXiv:2306.07557 (2023).
  5. Shamil Ayupov and Nadezhda Chirkova. 2022. Parameter-Efficient Finetuning of Transformers for Source Code. arXiv preprint arXiv:2212.05901 (2022).
  6. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
  7. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  8. The SelectGen Challenge: Finding the Best Training Samples for Few-Shot Neural Text Generation. In Proceedings of the 14th International Conference on Natural Language Generation. 325–330.
  9. Is the number of trainable parameters all that actually matters?. In I (Still) Can’t Believe It’s Not Better! Workshop at NeurIPS 2021. PMLR, 27–32.
  10. On the transferability of pre-trained language models for low-resource programming languages. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. 401–412.
  11. Revisiting parameter-efficient tuning: Are we really there yet? arXiv preprint arXiv:2202.07962 (2022).
  12. Nadezhda Chirkova and Sergey Troshin. 2021. Empirical study of transformers for source code. In Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 703–715.
  13. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  14. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nature Machine Intelligence 5, 3 (2023), 220–235.
  15. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).
  16. On the effectiveness of parameter-efficient fine-tuning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 12799–12807.
  17. On the cross-modal transfer from natural language to code through adapter modules. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. 71–81.
  18. Knowledge distillation: A survey. International Journal of Computer Vision 129 (2021), 1789–1819.
  19. Unixcoder: Unified cross-modal pre-training for code representation. arXiv preprint arXiv:2203.03850 (2022).
  20. Graphcodebert: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366 (2020).
  21. Parameter-efficient transfer learning with diff pruning. arXiv preprint arXiv:2012.07463 (2020).
  22. Improved automatic summarization of subroutines via attention to file context. In Proceedings of the 17th International Conference on Mining Software Repositories. 300–310.
  23. Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366 (2021).
  24. On the effectiveness of adapter-based tuning for pretrained language model adaptation. arXiv preprint arXiv:2106.03164 (2021).
  25. Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790–2799.
  26. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  27. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019).
  28. Mapping language to code in programmatic context. arXiv preprint arXiv:1808.09588 (2018).
  29. Learning and evaluating contextual embedding of source code. In International conference on machine learning. PMLR, 5110–5121.
  30. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021).
  31. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).
  32. Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021).
  33. Scaling down to scale up: A guide to parameter-efficient fine-tuning. arXiv preprint arXiv:2303.15647 (2023).
  34. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems 35 (2022), 1950–1965.
  35. An Empirical Study of Parameter-Efficient Fine-Tuning Methods for Pre-trained Code Models. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE.
  36. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1–35.
  37. GPT understands, too. AI Open (2023).
  38. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
  39. Continual mixed-language pre-training for extremely low-resource neural machine translation. arXiv preprint arXiv:2105.03953 (2021).
  40. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021).
  41. The Scope of ChatGPT in Software Engineering: A Thorough Investigation. arXiv preprint arXiv:2305.12138 (2023).
  42. On the robustness of code generation techniques: An empirical study on github copilot. arXiv preprint arXiv:2302.00438 (2023).
  43. Integrating image captioning with rule-based entity masking. arXiv preprint arXiv:2007.11690 (2020).
  44. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
  45. Clcdsa: cross language code clone detection using syntactical features and api documentation. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1026–1037.
  46. Codegen: An open large language model for code with multi-turn program synthesis. arXiv preprint arXiv:2203.13474 (2022).
  47. Deep learning meets software engineering: A survey on pre-trained models of source code. arXiv preprint arXiv:2205.11739 (2022).
  48. An Empirical Comparison of Pre-Trained Models of Source Code. arXiv preprint arXiv:2302.04026 (2023).
  49. Spt-code: Sequence-to-sequence pre-training for learning the representation of source code. arXiv preprint arXiv:2201.01549 (2022).
  50. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.
  51. AdapterHub: A Framework for Adapting Transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020): Systems Demonstrations. Association for Computational Linguistics, Online, 46–54. https://www.aclweb.org/anthology/2020.emnlp-demos.7
  52. Mad-x: An adapter-based framework for multi-task cross-lingual transfer. arXiv preprint arXiv:2005.00052 (2020).
  53. Improving language understanding by generative pre-training. (2018).
  54. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  55. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
  56. Learning multiple visual domains with residual adapters. Advances in neural information processing systems 30 (2017).
  57. Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297 (2020).
  58. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).
  59. Iman Saberi and Fatemeh H Fard. 2023. Model-Agnostic Syntactical Information for Pre-Trained Programming Language Models. arXiv preprint arXiv:2303.06233 (2023).
  60. semiPQA: A Study on Product Question Answering over Semi-structured Data. In Proceedings of the Fifth Workshop on e-Commerce and NLP (ECNLP 5). 111–120.
  61. Estimation of Gap Between Current Language Models and Human Performance.
  62. Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond. arXiv preprint arXiv:2304.05216 (2023).
  63. Compressing pre-trained models of code into 3 mb. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.
  64. Multi-label legal document classification: A deep learning-based approach with label-attention and domain-specific pre-training. Information Systems 106 (2022), 101718.
  65. An Empirical Study of Deep Learning Models for Vulnerability Detection. arXiv preprint arXiv:2212.08109 (2022).
  66. An empirical study of deep learning models for vulnerability detection. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2237–2248.
  67. Moviechats: Chat like humans in a closed domain. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). 6605–6619.
  68. Rocbert: Robust chinese bert with multimodal contrastive pretraining. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 921–931.
  69. Welm: A well-read pre-trained language model for chinese. arXiv preprint arXiv:2209.10372 (2022).
  70. Towards a big data curated benchmark of inter-project code clones. In 2014 IEEE International Conference on Software Maintenance and Evolution. IEEE, 476–480.
  71. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443.
  72. Fast and memory-efficient neural code completion. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 329–340.
  73. AST-transformer: Encoding abstract syntax trees efficiently for code summarization. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1193–1195.
  74. AST-trans: Code summarization with efficient tree-structured attention. In Proceedings of the 44th International Conference on Software Engineering. 150–162.
  75. Automating Code-Related Tasks Through Transformers: The Impact of Pre-training. arXiv preprint arXiv:2302.04048 (2023).
  76. Attention is all you need. Advances in neural information processing systems 30 (2017).
  77. No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 382–394.
  78. One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization. arXiv preprint arXiv:2303.15822 (2023).
  79. Bridging pre-trained models and downstream tasks for source code understanding. In Proceedings of the 44th International Conference on Software Engineering. 287–298.
  80. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021).
  81. A survey of transfer learning. Journal of Big data 3, 1 (2016), 1–40.
  82. BERT post-training for review reading comprehension and aspect-based sentiment analysis. arXiv preprint arXiv:1904.02232 (2019).
  83. Are the code snippets what we are searching for? a benchmark and an empirical study on code search with natural-language queries. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 344–354.
  84. Deep Learning Based Code Generation Methods: A Literature Review. arXiv preprint arXiv:2303.01056 (2023).
  85. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).
  86. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. arXiv preprint arXiv:2106.10199 (2021).
  87. Large language models meet NL2Code: A survey. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 7443–7464.
  88. An extensive study on pre-trained models for program understanding and generation. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 39–51.
  89. Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512 (2023).
  90. Mdia: A benchmark for multilingual dialogue generation in 46 languages. arXiv preprint arXiv:2208.13078 (2022).
  91. Yu Zhang and Qiang Yang. 2018. An overview of multi-task learning. National Science Review 5, 1 (2018), 30–43.
  92. Towards an understanding of large language models in software engineering tasks. arXiv preprint arXiv:2308.11396 (2023).
  93. StandUp4NPR: Standardizing SetUp for Empirically Comparing Neural Program Repair Systems. In 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.
  94. Assessing generalizability of codebert. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 425–436.
  95. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Advances in neural information processing systems 32 (2019).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Wentao Zou (1 paper)
  2. Qi Li (354 papers)
  3. Jidong Ge (17 papers)
  4. Chuanyi Li (16 papers)
  5. Xiaoyu Shen (73 papers)
  6. Liguo Huang (6 papers)
  7. Bin Luo (209 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub