Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Next-Step Hint Generation for Introductory Programming Using Large Language Models (2312.10055v1)

Published 3 Dec 2023 in cs.CY, cs.AI, and cs.HC

Abstract: LLMs possess skills such as answering questions, writing essays or solving programming exercises. Since these models are easily accessible, researchers have investigated their capabilities and risks for programming education. This work explores how LLMs can contribute to programming education by supporting students with automated next-step hints. We investigate prompt practices that lead to effective next-step hints and use these insights to build our StAP-tutor. We evaluate this tutor by conducting an experiment with students, and performing expert assessments. Our findings show that most LLM-generated feedback messages describe one specific next step and are personalised to the student's code and approach. However, the hints may contain misleading information and lack sufficient detail when students approach the end of the assignment. This work demonstrates the potential for LLM-generated feedback, but further research is required to explore its practical implementation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. John R Anderson and Brian J Reiser. 1985. The LISP tutor. Byte 10, 4 (1985), 159–175.
  2. Programming Is Hard-Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proc. of SIGCSE. 500–506.
  3. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  4. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  5. What makes for effective feedback: Staff and student perspectives. Assessment & Evaluation in Higher Ed. 44, 1 (2019), 25–36.
  6. A review of automated feedback systems for learners: Classification framework, challenges and opportunities. Computers & Education 162 (2021).
  7. Conversing with copilot: Exploring prompt engineering for solving cs1 problems using natural language. In Proc. of SIGCSE. 1136–1142.
  8. Computing Education in the Era of Generative AI. arXiv preprint arXiv:2306.02608 (2023).
  9. Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic? arXiv e-prints (2022).
  10. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Proc. of ACE. 10–19.
  11. My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises. In Proc. of ACE. 97–104.
  12. Towards understanding the effective design of automated formative feedback for programming assignments. Computer Science Education 32, 1 (2022), 105–127.
  13. John Hattie and Helen Timperley. 2007. The power of feedback. Review of educational research 77, 1 (2007), 81–112.
  14. Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests. In Proc. of ICER. 93–105.
  15. Alastair Irons and Sam Elkington. 2021. Enhancing learning through formative assessment and feedback. Routledge.
  16. Towards Giving Timely Formative Feedback and Hints to Novice Programmers. In ITiCSE Working Group Reports. 95–115.
  17. Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proc. of the CHI Conference. 1–23.
  18. Strategy-based feedback in a programming tutor. In Proc. of the Computer Science Education Research Conference. 43–54.
  19. A systematic literature review of automated feedback generation for programming exercises. ACM TOCE 19, 1 (2018), 1–43.
  20. Exploring the Potential of Large Language Models to Generate Formative Programming Feedback. arXiv preprint arXiv:2309.00029 (2023).
  21. J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement for categorical data. Biometrics (1977), 159–174.
  22. Using large language models to enhance programming error messages. In Proc. of SIGCSE. 563–569.
  23. TaskTracker-Tool: A Toolkit for Tracking of Code Snapshots and Activity Data During Solution of Programming Tasks. In Proc. of SIGCSE. 495–501.
  24. Experiences from using code explanations generated by large language models in a web software development e-book. In Proc. of SIGCSE. 931–937.
  25. Generating diverse code explanations using the gpt-3 large language model. In Proc. of ICER. 37–39.
  26. Yana Malysheva and Caitlin Kelleher. 2022. An Algorithm for Generating Explainable Corrections to Student Code. In Proc. of Koli Calling. 1–11.
  27. The impact of adding textual explanations to next-step hints in a novice programming environment. In Proc. of ITiCSE. 520–526.
  28. Intelligent tutoring systems: a systematic review of characteristics, applications, and evaluation methods. Interactive Learning Environments 29, 1 (2021), 142–163.
  29. Susanne Narciss. 2008. Feedback strategies for interactive learning tasks. In Handbook of research on educ. communications and technology. 125–143.
  30. OpenAI. 2023. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023).
  31. The Continuous Hint Factory-Providing Hints in Vast and Sparsely Populated Edit Distance Spaces. Journal of Educational Data Mining 10, 1 (2018), 1–35.
  32. Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. arXiv preprint arXiv:2302.04662 (2023).
  33. The Robots are Here: Navigating the Generative AI Revolution in Computing Education. arXiv preprint arXiv:2310.00658 (2023).
  34. “It’s Weird That It Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. ACM Trans. Comput.-Hum. Interact. (2023).
  35. Generating data-driven hints for open-ended programming. Int. Educ. Data Mining Society (2016).
  36. A comparison of the quality of data-driven programming hint generation algorithms. International Journal of Artificial Intelligence in Education 29 (2019), 368–395.
  37. Factors influencing students’ help-seeking behavior while programming with human and computer tutors. In Proc. of ICER. 127–135.
  38. Kelly Rivers and Kenneth R Koedinger. 2017. Data-driven hint generation in vast solution spaces: a self-improving python programming tutor. International Journal of Artificial Intelligence in Education 27 (2017), 37–64.
  39. Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models. In Proc. of ICER. 27–43.
  40. Valerie J Shute. 2008. Focus on formative feedback. Review of educational research 78, 1 (2008), 153–189.
  41. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In CHI conference extended abstracts. 1–7.
  42. Repairing Bugs in Python Assignments Using Large Language Models. arXiv preprint arXiv:2209.14876 (2022).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Lianne Roest (1 paper)
  2. Hieke Keuning (16 papers)
  3. Johan Jeuring (13 papers)
Citations (20)