Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Explaining Code Examples in Introductory Programming Courses: LLM vs Humans (2403.05538v2)

Published 9 Dec 2023 in cs.CY, cs.HC, and cs.SE

Abstract: Worked examples, which present an explained code for solving typical programming problems are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide explanations for many examples typically used in a programming class. In this paper, we assess the feasibility of using LLMs to generate code explanations for passive and active example exploration systems. To achieve this goal, we compare the code explanations generated by chatGPT with the explanations generated by both experts and students.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005.
  2. Chatgpt to replace crowdsourcing of paraphrases for intent classification: Higher diversity and comparable model robustness. ArXiv, abs/2305.12947, 2023.
  3. Universal sentence encoder. arXiv preprint arXiv:1803.11175, 2018.
  4. Automated assessment of student self-explanation during source code comprehension. In The International FLAIRS Conference Proceedings, 2022.
  5. GPTutor: A ChatGPT-powered programming tool for code explanation. In Artificial Intelligence in Education., 2023.
  6. Readsum: Retrieval-augmented adaptive transformer for source code summarization. IEEE Access, 11:51155–51165, 2023.
  7. Error message readability and novice debugging performance. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education, 2020.
  8. On designing programming error messages for novices: Readability and its constituent factors. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 2021.
  9. Textdescriptives: A python package for calculating a large variety of metrics from text. J. Open Source Softw., 8:5153, 2023.
  10. Semantic similarity metrics for evaluating source code summarization. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, 2022.
  11. Improving engagement in program construction examples for learning python programming. International Journal of Artificial Intelligence in Education, 30(2):299–336, Jun 2020.
  12. The role of community feedback in the student example authoring process: an evaluation of annotex. British Journal of Educational Technology, 2011.
  13. Survey of hallucination in natural language generation. ACM Computing Surveys, 2023.
  14. Victoria Johansson. Lexical diversity and lexical density in speech and writing: A developmental perspective. Working papers/Lund University, Department of Linguistics and Phonetics, 53:61–79, 2008.
  15. Codemotion: expanding the design space of learner interactions with computer programming tutorial videos. Proceedings of the Fifth Annual ACM Conference on Learning at Scale, 2018. URL https://api.semanticscholar.org/CorpusID:49304895.
  16. Comparing Code Explanations Created by Students and Large Language Models. 2023. arXiv:2304.03938.
  17. Explaining competitive-level programming solutions using LLMs, 2023. arXiv:2307.05337.
  18. The case for case studies of programming problems. Commun. ACM, 35:121–132, 1992. URL https://api.semanticscholar.org/CorpusID:2856243.
  19. Experiences from using code explanations generated by large language models in a web software development e-book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education, 2023.
  20. Improving code comprehension through scaffolded self-explanations. In Proceedings of 24th International Conference on Artificial Intelligence in Education, 2023.
  21. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002.
  22. Rethinking positional encoding in tree transformer for code representation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022.
  23. Improved evaluation of automatic source code summarisation. Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), 2022.
  24. Maja Popović. chrf: character n-gram f-score for automatic mt evaluation. In Proceedings of the tenth workshop on statistical machine translation, 2015.
  25. Reassessing automatic evaluation metrics for code summarization tasks. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021.
  26. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research, 2022.
  27. Codecast: An innovative technology to facilitate teaching and learning computer programming in a c language online course. Proceedings of the Fourth ACM Conference on Learning @ Scale, 2017.
  28. Code-DKT: A code-based knowledge tracing model for programming tasks, 2022. ArXiv:2206.03545.
  29. Is ChatGPT the ultimate programming assistant – how far is it?, 2023. arXiv:2304.11938.
  30. Why Johnny can’t prompt: How non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023.
  31. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Arun-Balajiee Lekshmi-Narayanan (3 papers)
  2. Priti Oli (6 papers)
  3. Jeevan Chapagain (4 papers)
  4. Mohammad Hassany (5 papers)
  5. Rabin Banjade (6 papers)
  6. Peter Brusilovsky (15 papers)
  7. Vasile Rus (6 papers)
Citations (3)