Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MESIA: Understanding and Leveraging Supplementary Nature of Method-level Comments for Automatic Comment Generation (2403.17357v1)

Published 26 Mar 2024 in cs.SE and cs.AI

Abstract: Code comments are important for developers in program comprehension. In scenarios of comprehending and reusing a method, developers expect code comments to provide supplementary information beyond the method signature. However, the extent of such supplementary information varies a lot in different code comments. In this paper, we raise the awareness of the supplementary nature of method-level comments and propose a new metric named MESIA (Mean Supplementary Information Amount) to assess the extent of supplementary information that a code comment can provide. With the MESIA metric, we conduct experiments on a popular code-comment dataset and three common types of neural approaches to generate method-level comments. Our experimental results demonstrate the value of our proposed work with a number of findings. (1) Small-MESIA comments occupy around 20% of the dataset and mostly fall into only the WHAT comment category. (2) Being able to provide various kinds of essential information, large-MESIA comments in the dataset are difficult for existing neural approaches to generate. (3) We can improve the capability of existing neural approaches to generate large-MESIA comments by reducing the proportion of small-MESIA comments in the training set. (4) The retrained model can generate large-MESIA comments that convey essential meaningful supplementary information for methods in the small-MESIA test set, but will get a lower BLEU score in evaluation. These findings indicate that with good training data, auto-generated comments can sometimes even surpass human-written reference comments, and having no appropriate ground truth for evaluation is an issue that needs to be addressed by future work on automatic comment generation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. 2018. https://github.com/xing-hu/TL-CodeSum.
  2. 2020. https://github.com/wasiahmad/NeuralCodeSum/tree/master/scripts/java.
  3. 2020. https://huggingface.co/microsoft/CodeGPT-small-java-adaptedGPT2.
  4. 2021. https://stackoverflow.blog/2021/12/23/best-practices-for-writing-code-comments/.
  5. 2022. https://github.com/BuiltOntheRock/FSE22_BuiltOntheRock.
  6. 2022. https://github.com/adf1178/PT4Code/blob/main/summarization/finetune_t5_gene.py.
  7. 2023. https://docs.oracle.com/en/java/javase/21/docs/api/index.html.
  8. 2023. Spearman’s rank correlation coefficient. https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
  9. A Transformer-based Approach for Source Code Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL’20). 4998–5007.
  10. Toufique Ahmed and Premkumar Devanbu. 2023. Few-Shot Training LLMs for Project-Specific Code-Summarization. In Proceedings of the 37th International Conference on Automated Software Engineering (ASE’23). 1–5.
  11. code2seq: Generating Sequences from Structured Representations of Code. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19).
  12. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 65–72.
  13. Extracting Code-Relevant Description Sentences Based on Structural Similarity. In Proceedings of the 11th Asia-Pacific Symposium on Internetware (Internetware’19). 1–10.
  14. Why My Code Summarization Model Does Not Work: Code Comment Improvement with Category Prediction. ACM Transactions on Software Engineering and Methodology. 30, 2 (2021), 1–29.
  15. A Study of the Documentation Essential to Software Maintenance. In Proceedings of the 23rd Annual International Conference on Design of Communication: Documenting & Designing for Pervasive Information (SIGDOC’05). 68–75.
  16. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 1536–1547.
  17. Joseph L. Fleiss. 1971. Measuring Nominal Scale Agreement Among Many Raters. Psychological Bulletin 76 (1971), 378–382.
  18. Human-like Summarization Evaluation with ChatGPT. arXiv e-prints (2023), arXiv:2304.02554.
  19. Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning. arXiv e-prints (2023), arXiv:2304.11384.
  20. Code to Comment “Translation”: Data, Metrics, Baselining & Evaluation. In Proceedings of the 35th International Conference on Automated Software Engineering (ASE’20). 746–757.
  21. GraphCodeBERT: Pre-training Code Representations with Data Flow. In Proceedings of the 9th International Conference on Learning Representations (ICLR’21).
  22. Hao He. 2019. Understanding Source Code Comments at Large-Scale. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’19). 1217–1219.
  23. When Not to Comment: Questions and Tradeoffs with API Documentation for C++ Projects. In Proceedings of the 40th International Conference on Software Engineering (ICSE’18). 643–653.
  24. Shorter Identifier Names Take Longer to Comprehend. In Proceedings of the 24th International Conference on Software Analysis, Evolution and Reengineering (SANER’17). 217–227.
  25. Deep Code Comment Generation. In Proceedings of the 26th International Conference on Program Comprehension (ICPC’18). 200–210.
  26. Summarizing Source Code with Transferred API Knowledge. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). 2269–2275.
  27. Practitioners’ Expectations on Automated Code Comment Generation. In Proceedings of the 44th International Conference on Software Engineering (ICSE’22). 1693–1705.
  28. Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16). 2073–2083.
  29. Mira Kajko-Mattsson. 2005. A Survey of Documentation Practice within Corrective Maintenance. Empirical Software Engineering 10, 1 (2005), 31–55.
  30. Automatic Quality Assessment of Source Code Comments: The JavadocMiner. In Proceedings of the Natural Language Processing and Information Systems, and 15th International Conference on Applications of Natural Language to Information Systems (NLDB’10). 68–79.
  31. Automatic Detection of Five API Documentation Smells: Practitioners’ Perspectives. In Proceedings of the 28th International Conference on Software Analysis, Evolution and Reengineering (SANER’21). 318–329.
  32. Junaed Younus Khan and Gias Uddin. 2023. Automatic Code Documentation Generation using GPT-3. In Proceedings of the 37th International Conference on Automated Software Engineering (ASE’23). 1–6.
  33. Leveraged Quality Assessment using Information Retrieval Techniques. In Proceedings of the 14th International Conference on Program Comprehension (ICPC’06). 149–158.
  34. A Neural Model for Generating Natural Language Summaries of Program Subroutines. In Proceedings of the 41st International Conference on Software Engineering (ICSE’19). 795–806.
  35. DeepCommenter: A Deep Code Comment Generation Tool with Hybrid Lexical and Syntactical Information. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’20). 1571–1575.
  36. EditSum: A Retrieve-and-Edit Framework for Source Code Summarization. In Proceedings of the 36th International Conference on Automated Software Engineering (ASE’21). 155–166.
  37. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out. 74–81.
  38. Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks. In Proceedings of the 43rd International Conference on Software Engineering (ICSE’21). 336–347.
  39. Paul W McBurney and Collin McMillan. 2014. Automatic Documentation Generation via Source Code Summarization of Method Context. In Proceedings of the 22nd International Conference on Program Comprehension (ICPC’14). 279–290.
  40. Paul W. Mcburney and Collin Mcmillan. 2016. An Empirical Study of the Textual Similarity between Source Code and Source Code Summaries. Empirical Software Engineering 21, 1 (2016), 17–42.
  41. Developer-Intent Driven Code Comment Generation. In Proceedings of the 45th International Conference on Software Engineering (ICSE’23). 768–780.
  42. John Ousterhout. 2018. Comments Should Describe Things that Aren’t Obvious from the Code. In A Philosophy of Software Design. Chapter 13, 95–116.
  43. BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02). 311–318.
  44. Summarization is (Almost) Dead. arXiv e-prints (2023), arXiv:2309.09558.
  45. Pooja Rani. 2021. Speculative Analysis for Quality Assessment of Code Comments. Proceedings of the 43rd International Conference on Software Engineering: Companion Proceedings (ICSE’21 Companion), 299–303.
  46. A Decade of Code Comment Quality Assessment: A Systematic Literature Review. Journal of Systems and Software. 195, C (2023), 22 pages.
  47. Reassessing Automatic Evaluation Metrics for Code Summarization Tasks. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’21). 1105–1116.
  48. C. E. Shannon. 1948. A Mathematical Theory of Communication. The Bell System Technical Journal 27, 3 (1948), 379–423.
  49. On the Evaluation of Neural Code Summarization. In Proceedings of the 44th International Conference on Software Engineering (ICSE’22). 1597–1608.
  50. Are We Building on the Rock? On the Importance of Data Preprocessing for Code Summarization. In Proceedings of the 30th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’22). 107–119.
  51. A Survey of Automatic Generation of Source Code Comments: Algorithms and Techniques. IEEE Access 7 (2019), 111411–111428.
  52. Generating Parameter Comments and Integrating with Method Summaries. In Proceedings of the 19th International Conference on Program Comprehension (ICPC’11). 71–80.
  53. A Human Study of Comprehension and Code Summarization. In Proceedings of the 28th International Conference on Program Comprehension (ICPC’20). 2–13.
  54. Quality Analysis of Source Code Comments. In Proceedings of the 21st International Conference on Program Comprehension (ICPC’13). 83–92.
  55. Automatic Code Summarization via ChatGPT: How Far Are We? arXiv e-prints (2023), arXiv:2305.12865.
  56. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NeurIPS’14). 3104–3112.
  57. AST-Transformer: Encoding Abstract Syntax Trees Efficiently for Code Summarization. In Proceedings of the 36th International Conference on Automated Software Engineering (ASE’21). 1193–1195.
  58. T. Tenny. 1988. Program Readability: Procedures Versus Comments. IEEE Transactions on Software Engineering 14, 9 (1988), 1271–1279.
  59. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS’17). 6000–6010.
  60. Improving Automatic Source Code Summarization via Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Automated Software Engineering (ASE’18). 397–407.
  61. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP’21). 8696–8708.
  62. Retrieve and Refine: Exemplar-based Neural Comment Generation. In Proceedings of the 35th International Conference on Automated Software Engineering (ASE’21). 349–360.
  63. The Effect of Modularization and Comments on Program Comprehension. In Proceedings of the 5th International Conference on Software Engineering (ICSE’81). 215–223.
  64. Measuring Program Comprehension: A Large-Scale Field Study with Professionals. IEEE Transactions on Software Engineering 44, 10 (2018), 951–976.
  65. A Survey on Research of Code Comment. In Proceedings of the 3rd International Conference on Management Engineering, Software Engineering and Service Sciences (ICMSS’19). 45–51.
  66. CPC: Automatically Classifying and Propagating Natural Language Comments via Program Analysis. In Proceedings of the 42nd International Conference on Software Engineering (ICSE’20). 1359–1371.
  67. Retrieval-based Neural Source Code Summarization. In Proceedings of the 42nd International Conference on Software Engineering (ICSE’20). 1385–1397.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xinglu Pan (2 papers)
  2. Chenxiao Liu (6 papers)
  3. Yanzhen Zou (9 papers)
  4. Tao Xie (117 papers)
  5. Bing Xie (25 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com