Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Prompt Learning Framework for Source Code Summarization (2312.16066v2)

Published 26 Dec 2023 in cs.SE and cs.AI

Abstract: (Source) code summarization is the task of automatically generating natural language summaries (also called comments) for given code snippets. Recently, with the successful application of LLMs in numerous fields, software engineering researchers have also attempted to adapt LLMs to solve code summarization tasks. The main adaptation schemes include instruction prompting, task-oriented (full-parameter) fine-tuning, and parameter-efficient fine-tuning (PEFT). However, instruction prompting involves designing crafted prompts and requires users to have professional domain knowledge, while task-oriented fine-tuning requires high training costs, and effective, tailored PEFT methods for code summarization are still lacking. This paper proposes an effective prompt learning framework for code summarization called PromptCS. It no longer requires users to rack their brains to design effective prompts. Instead, PromptCS trains a prompt agent that can generate continuous prompts to unleash the potential for LLMs in code summarization. Compared to the human-written discrete prompt, the continuous prompts are produced under the guidance of LLMs and are therefore easier to understand by LLMs. PromptCS is non-invasive to LLMs and freezes the parameters of LLMs when training the prompt agent, which can greatly reduce the requirements for training resources. Our comprehensive experimental results show that PromptCS significantly outperforms instruction prompting schemes (including zero-shot learning and few-shot learning) on all four widely used metrics, and is comparable to the task-oriented fine-tuning scheme. In some base LLMs, e.g., StarCoderBase-1B and -3B, PromptCS even outperforms the task-oriented fine-tuning scheme. More importantly, the training efficiency of PromptCS is faster than the task-oriented fine-tuning scheme, with a more pronounced advantage on larger LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. A Transformer-based Approach for Source Code Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4998–5007.
  2. Unified Pre-training for Program Understanding and Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 2655–2668.
  3. Toufique Ahmed and Premkumar T. Devanbu. 2022a. Artifacts of Few-shot training LLMs for project-specific code-summarization. site: https://github.com/toufiqueparag/few_shot_code_summarization. Accessed December, 2023.
  4. Toufique Ahmed and Premkumar T. Devanbu. 2022b. Few-shot Training LLMs for Project-specific Code-summarization. In Proceedings of the 37th International Conference on Automated Software Engineering. ACM, Rochester, MI, USA, 177:1–177:5.
  5. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the 3rd International Conference on Learning Representations. OpenReview.net, San Diego, CA, USA, 1–15.
  6. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, Ann Arbor, Michigan, USA, 65–72.
  7. Function Call Graph Context Encoding for Neural Source Code Summarization. IEEE Transactions on Software Engineering 49, 9 (2023), 4268–4281.
  8. Exploring Large Language Models for Code Explanation. CoRR abs/2310.16673, 1 (2023), 1–6.
  9. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. EleutherAI.
  10. Language Models are Few-Shot Learners. In Proceedings of the 34th Annual Conference on Neural Information Processing Systems. Curran Associates Inc., Virtual, 1877–1901.
  11. Evaluating Large Language Models Trained on Code. CoRR abs/2107.03374, 1 (2021), 1–19.
  12. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. In Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. Association for Computational Linguistics, Doha, Qatar, 103–111.
  13. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 19th Conference on Empirical Methods in Natural Language Processing. ACL, Doha, Qatar, 1724–1734.
  14. A study of the documentation essential to software maintenance. In Proceedings of the 23rd Annual International Conference on Design of Communication: documenting & Designing for Pervasive Information. ACM, Coventry, UK, 68–75.
  15. Shortcut Learning of Large Language Models in Natural Language Understanding: A Survey. CoRR abs/2208.11857, 1 (2022), 1–10.
  16. Evaluating Source Code Summarization Techniques: Replication and Expansion. In Proceedings of the 21st International Conference on Program Comprehension. IEEE Computer Society, San Francisco, CA, USA, 13–22.
  17. Large Language Models for Software Engineering: Survey and Open Problems. CoRR abs/2310.03533, 1 (2023), 1–23.
  18. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Proceedings of the 25th Conference on Empirical Methods in Natural Language Processing: Findings. Association for Computational Linguistics, Online Event, 1536–1547.
  19. InCoder: A Generative Model for Code Infilling and Synthesis. In Proceedings of the 11th International Conference on Learning Representations. OpenReview.net, Kigali, Rwanda, 1–14.
  20. Code to Comment ”Translation”: Data, Metrics, Baselining & Evaluation. In Proceedings of the 35th International Conference on Automated Software Engineering. IEEE, Melbourne, Australia, 746–757.
  21. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Dublin, Ireland, 7212–7225.
  22. GraphCodeBERT: Pre-training Code Representations with Data Flow. In Proceedings of the 9th International Conference on Learning Representations. OpenReview.net, Virtual Event, Austria, 1–12.
  23. Supporting program comprehension with source code summarization. In Proceedings of the 32nd International Conference on Software Engineering. ACM, Cape Town, South Africa, 223–226.
  24. On the Use of Automated Text Summarization Techniques for Summarizing Source Code. In Proceedings of the 17th Working Conference on Reverse Engineering. IEEE Computer Society, Beverly, MA, USA, 35–44.
  25. Semantic similarity metrics for evaluating source code summarization. In Proceedings of the 30th International Conference on Program Comprehension. ACM, Virtual Event, 36–47.
  26. Large Language Models for Software Engineering: A Systematic Literature Review. CoRR abs/2308.10620, 1 (2023), 1–62.
  27. Correlating Automated and Human Evaluation of Code Documentation Generation Quality. ACM Transactions on Software Engineering and Methodology 31, 4 (2022), 63:1–63:28.
  28. Deep Code Comment Generation. In Proceedings of the 26th International Conference on Program Comprehension. ACM, Gothenburg, Sweden, 200–210.
  29. Deep Code Comment Generation with Hybrid Lexical and Syntactical Information. Empirical Software Engineering 25, 3 (2020), 2179–2217.
  30. Summarizing Source Code with Transferred API Knowledge. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. ijcai.org, Stockholm, Sweden, 2269–2275.
  31. Practitioners’ Expectations on Automated Code Comment Generation. In Proceedings of the 44th International Conference on Software Engineering. ACM, Pittsburgh, PA, USA, 1693–1705.
  32. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. CoRR abs/1909.09436, 1 (2019), 1–6.
  33. Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. The Association for Computer Linguistics, Berlin, Germany, 2073–2083.
  34. InferFix: End-to-End Program Repair with LLMs over Retrieval-Augmented Prompts. CoRR abs/2303.07263, 1 (2023), 1–11.
  35. Scaling Laws for Neural Language Models. CoRR abs/2001.08361, 1 (2020), 1–19.
  36. The Stack: 3 TB of permissively licensed source code. CoRR abs/2211.15533, 1 (2022), 1–18.
  37. StarCoder: may the source be with you! CoRR abs/2305.06161, 1 (2023), 1–44.
  38. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics – workshop on Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81.
  39. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In Proceedings of the 7th International Conference on Learning Representations. OpenReview.net, New Orleans, LA, USA, 1–11.
  40. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1. Openreview.net, virtual, 1–14.
  41. Cliff’s Delta Calculator: A non-parametric effect size program for two groups of observations. Universitas Psychologica 10, 2 (2011), 545–555.
  42. A Convolutional Neural Network for Language-Agnostic Source Code Summarization. In Proceedings of the 14th International Conference on Evaluation of Novel Approaches to Software Engineering. SciTePress, Heraklion, Crete, Greece, 15–26.
  43. Automatic Generation of Natural Language Summaries for Java Classes. In Proceedings of the 21st International Conference on Program Comprehension. IEEE Computer Society, San Francisco, CA, USA, 23–32.
  44. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In Proceedings of the 11th International Conference on Learning Representations. OpenReview.net, Kigali, Rwanda, 1–13.
  45. OpenAI. 2022. ChatGPT. site: https://openai.com/blog/chatgpt. Accessed December, 2023.
  46. OpenAI. 2023a. Codex. site: https://openai.com/blog/openai-codex. Accessed December, 2023.
  47. OpenAI. 2023b. OpenAI API. site: https://platform.openai.com/docs/models. Accessed December, 2023.
  48. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. ACL, Philadelphia, PA, USA, 311–318.
  49. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. ACL, Doha, Qatar, 1532–1543.
  50. Is ChatGPT a General-Purpose Natural Language Processing Task Solver?. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Singapore, 1339–1384.
  51. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 1–12.
  52. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Hong Kong, China, 3980–3990.
  53. Code Llama: Open Foundation Models for Code. CoRR abs/2308.12950, 1 (2023), 1–47.
  54. On the Evaluation of Neural Code Summarization. In Proceedings of the 44th International Conference on Software Engineering. IEEE, Pittsburgh, USA, 1597––1608.
  55. Towards Automatically Generating Summary Comments for Java Methods. In Proceedings of the 25th International Conference on Automated Software Engineering. ACM, Antwerp, Belgium, 43–52.
  56. A Human Study of Comprehension and Code Summarization. In Proceedings of the 28th International Conference on Program Comprehension. ACM, Seoul, Republic of Korea, 2–13.
  57. An Extractive-and-Abstractive Framework for Source Code Summarization. ACM Transactions on Software Engineering and Methodology Just Accepted, 1 (2023), 1–39.
  58. Artifacts of PromptCS. site: https://github.com/wssun/PromptCS. Accessed December, 2023.
  59. Automatic Code Summarization via ChatGPT: How Far Are We? CoRR abs/2305.12865 (2023), 1–13.
  60. Ted Tenny. 1988. Program Readability: Procedures Versus Comments. IEEE Transactions on Software Engineering 14, 9 (1988), 1271–1279.
  61. Is ChatGPT the Ultimate Programming Assistant–How far is it? CoRR abs/2304.11938, 1 (2023), 1–22.
  62. LLaMA: Open and Efficient Foundation Language Models. CoRR abs/2302.13971, 1 (2023), 1–16.
  63. Attention is All You Need. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems. Curran Associates Inc., Long Beach, CA, USA, 5998–6008.
  64. Improving Automatic Source Code Summarization via Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Automated Software Engineering. ACM, Montpellier, France, 397–407.
  65. One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization. In Proceedings of the 45th International Conference on Software Engineering. IEEE, Melbourne, Australia, 5–16.
  66. Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention. IEEE Transactions on Software Engineering 48, 2 (2022), 102–119.
  67. Self-Instruct: Aligning Language Models with Self-Generated Instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Toronto, Canada, 13484–13508.
  68. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 26th Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Virtual Event / Punta Cana, Dominican Republic, 8696–8708.
  69. Retrieve and Refine: Exemplar-based Neural Comment Generation. In Proceedings of the 35th International Conference on Automated Software Engineering. IEEE, Melbourne, Australia, 349–360.
  70. Finetuned Language Models are Zero-Shot Learners. In Proceedings of the 10th International Conference on Learning Representations. OpenReview.net, Virtual Event, 1–21.
  71. Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models. CoRR abs/2308.10462, 1 (2023), 1–12.
  72. Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. American Cyanamid Company, USA.
  73. Code Summarization with Structure-induced Transformer. In Proceedings of the Findings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Online Event, 1078–1090.
  74. A Systematic Evaluation of Large Language Models of Code. In Proceedings of the 6th International Symposium on Machine Programming. ACM, San Diego, CA, USA, 1–10.
  75. NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models. In Proceedings of the 18th International Conference on Computer Vision. IEEE, Montreal, QC, Canada, 5077–5086.
  76. Retrieval-based Neural Source Code Summarization. In Proceedings of the 42nd International Conference on Software Engineering. ACM, Seoul, South Korea, 1385–1397.
  77. Automatic source code summarization with graph attention networks. Journal of Systems and Software 188 (2022), 111257.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Weisong Sun (45 papers)
  2. Chunrong Fang (71 papers)
  3. Chong Wang (308 papers)
  4. Jian Zhang (542 papers)
  5. Hanwei Qian (5 papers)
  6. Yang Liu (2253 papers)
  7. Zhenyu Chen (91 papers)
  8. Tingting Xu (7 papers)
  9. Yun Miao (10 papers)
  10. Xia Feng (9 papers)
  11. Zhenpeng Chen (39 papers)
Citations (10)