Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Large Language Models for Code Explanation (2310.16673v1)

Published 25 Oct 2023 in cs.SE, cs.AI, and cs.IR

Abstract: Automating code documentation through explanatory text can prove highly beneficial in code understanding. LLMs have made remarkable strides in Natural Language Processing, especially within software engineering tasks such as code generation and code summarization. This study specifically delves into the task of generating natural-language summaries for code snippets, using various LLMs. The findings indicate that Code LLMs outperform their generic counterparts, and zero-shot methods yield superior results when dealing with datasets with dissimilar distributions between training and testing sets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Y. Liang, K. Zhu, Automatic generation of text descriptive comments for code blocks, in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  2. Lamner: code comment generation using character language model and named entity recognition, in: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, 2022, pp. 48–59.
  3. T. Ahmed, P. Devanbu, Few-shot training llms for project-specific code-summarization, in: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, pp. 1–5.
  4. I. Ozkaya, Application of large language models to software engineering tasks: Opportunities, risks, and implications, IEEE Software 40 (2023) 4–8.
  5. Natural language generation and understanding of big code for AI-assisted programming: A review, Entropy 25 (2023) 888. URL: https://doi.org/10.3390%2Fe25060888. doi:10.3390/e25060888.
  6. Llama 2: Open foundation and fine-tuned chat models, arXiv preprint arXiv:2307.09288 (2023). URL: https://huggingface.co/meta-llama/Llama-2-70b-chat-hf.
  7. R. L. et.al., Starcoder: may the source be with you!, arXiv preprint arXiv:2305.06161 (2023). URL: https://huggingface.co/bigcode/starcoder.
  8. Generative ai for software metadata: Overview of the information retrieval in software engineering track at fire 2023, in: Forum for Information Retrieval Evaluation, ACM, 2023.
  9. Experiences from using code explanations generated by large language models in a web software development e-book, in: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1, 2023, pp. 931–937.
  10. Summarizing source code using a neural attention model, in: 54th Annual Meeting of the Association for Computational Linguistics 2016, Association for Computational Linguistics, 2016, pp. 2073–2083.
  11. Deep code comment generation, in: Proceedings of the 26th Conference on Program Comprehension, Association for Computing Machinery, 2018, p. 200–210.
  12. On the use of automated text summarization techniques for summarizing source code, in: 2010 17th Working conference on reverse engineering, IEEE, 2010, pp. 35–44.
  13. Evaluating source code summarization techniques: Replication and expansion, in: 2013 21st International Conference on Program Comprehension (ICPC), IEEE, 2013, pp. 13–22.
  14. Automatic generation of natural language summaries for java classes, in: 2013 21st International conference on program comprehension (ICPC), IEEE, 2013, pp. 23–32.
  15. Harnessing the power of llms in practice: A survey on chatgpt and beyond, arXiv preprint arXiv:2304.13712 (2023).
  16. Alpaca: A strong, replicable instruction-following model, Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html 3 (2023) 7.
  17. The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only, arXiv preprint arXiv:2306.01116 (2023).
  18. Code llama: Open foundation models for code, arXiv preprint arXiv:2308.12950 (2023). URL: https://huggingface.co/codellama.
  19. Large language models are few-shot summarizers: Multi-intent comment generation via in-context learning (2024).
  20. Learning to mine aligned code and natural language pairs from stack overflow, in: International Conference on Mining Software Repositories, ACM, 2018, pp. 476–486. URL: https://conala-corpus.github.io/.
  21. Bleu: A method for automatic evaluation of machine translation, Association for Computational Linguistics, USA, 2002, p. 311–318.
  22. Codebert: A pre-trained model for programming and natural languages, in: Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 1536–1547.
  23. 8-bit optimizers via block-wise quantization, 9th International Conference on Learning Representations, ICLR (2022).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Paheli Bhattacharya (12 papers)
  2. Manojit Chakraborty (5 papers)
  3. Kartheek N S N Palepu (1 paper)
  4. Vikas Pandey (7 papers)
  5. Ishan Dindorkar (1 paper)
  6. Rakesh Rajpurohit (1 paper)
  7. Rishabh Gupta (44 papers)
Citations (6)