Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs (2401.11314v2)

Published 20 Jan 2024 in cs.HC and cs.AI

Abstract: Timely, personalized feedback is essential for students learning programming. LLM-powered tools like ChatGPT offer instant support, but reveal direct answers with code, which may hinder deep conceptual engagement. We developed CodeAid, an LLM-powered programming assistant delivering helpful, technically correct responses, without revealing code solutions. CodeAid answers conceptual questions, generates pseudo-code with line-by-line explanations, and annotates student's incorrect code with fix suggestions. We deployed CodeAid in a programming class of 700 students for a 12-week semester. A thematic analysis of 8,000 usages of CodeAid was performed, further enriched by weekly surveys, and 22 student interviews. We then interviewed eight programming educators to gain further insights. Our findings reveal four design considerations for future educational AI assistants: D1) exploiting AI's unique benefits; D2) simplifying query formulation while promoting cognitive engagement; D3) avoiding direct responses while encouraging motivated learning; and D4) maintaining transparency and control for students to asses and steer AI responses.

Evaluation and Deployment of CodeAid: An LLM-Based Programming Assistant

The paper "CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs" presents a detailed examination of CodeAid—a programming assistant powered by LLMs. The research outlines both the pedagogical approach and the semester-long deployment of this AI tool in a large university-level programming course, targeting its efficacy in providing technical support while mitigating academic integrity concerns.

CodeAid was introduced to address the growing demand for scalable educational solutions that provide timely and personalized feedback to students learning programming languages, particularly C. While LLMs, including prominent tools like ChatGPT, have demonstrated potential in educational domains, the direct provision of code solutions poses challenges related to academic integrity and cognitive engagement. CodeAid differentiates itself by structuring its interactions to offer conceptual insights, pseudo-code explanations, and recommendations without revealing direct code solutions. The tool serves as a supplementary resource, promoting motivated learning and cognitive engagement among students, and offering educators potential integration into broader teaching frameworks.

Core Features and Deployment Insights

CodeAid was strategically incorporated into a second-year Systems Programming course at the University of Toronto, involving approximately 700 students. Throughout a 12-week semester, CodeAid enabled students to interact with five primary features: asking general programming questions, querying code-specific contexts, explaining code, receiving debugging assistance, and seeking guidance in code writing. These features were designed through iterative prompt engineering to ensure responses were both technically accurate and educationally valuable, without bypassing essential learning processes.

The thematic analysis of over 8,000 student interactions provided insights into usage patterns and helped refine functionality mid-course. Notably, the General Question feature was the most utilized, highlighting students' frequent need for conceptual clarifications. Feedback-driven improvements included real-time streaming of answers for faster response times, the introduction of interactive pseudo-code to visualize concepts without coding directly, and integrating static function documentation to supplement learning contextually.

Implications and Future Directions

The insights derived from CodeAid's deployment pinpoint critical considerations for the design and development of future educational AI tools. Among them is the necessary balance between providing direct solutions and promoting cognitive engagement—a challenge that requires careful handling through innovative interaction designs and pedagogical alignment. Furthermore, giving educators control over the assistant’s design, scope, and included functionalities would ensure these tools better align with their curricula and help gauge student engagement.

This work also opens pathways for future research to assess the longitudinal impacts of deploying such AI-assisted learning technologies across varied educational contexts, focusing on how they influence student competency, autonomy in learning, and critical thinking skills.

Conclusion

This paper advances the understanding of LLM-powered tools in educational settings, offering a robust methodology for evaluating their deployment and reception in large-scale educational environments. CodeAid signifies an important step toward integrating AI tools in programming education, ensuring both student and educator needs are met. This balance is crucial to harness the potential of AI while maintaining educational integrity and fostering genuine learning experiences. As AI continues to evolve, such research will play a pivotal role in shaping its effective and ethical use in education.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Many Small Programs in CS1: Usage Analysis from Multiple Universities. In 2019 ASEE Annual Conference & Exposition ”. ASEE Conferences, Tampa, Florida, 1–13. https://peer.asee.org/33084.
  2. Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–13.
  3. Programming Is Hard-Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 500–506.
  4. Andrea J Bingham and Patricia Witkowsky. 2021. Deductive and inductive approaches to qualitative data analysis. Analyzing and interpreting qualitative data: After the interview (2021), 133–146.
  5. On the Opportunities and Risks of Foundation Models. https://doi.org/10.48550/ARXIV.2108.07258
  6. Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1589–1598.
  7. Broadening Participation in Computing via Ubiquitous Combined Majors (CS+X). In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education - Volume 1 (Providence, RI, USA) (SIGCSE 2022). Association for Computing Machinery, New York, NY, USA, 544–550. https://doi.org/10.1145/3478431.3499352
  8. Language Models are Few-shot Learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  9. The Future of Computing Education Materials. (2023).
  10. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  11. Jonathan E Collins. 2023. Policy Solutions: Policy questions for ChatGPT and artificial intelligence. Phi Delta Kappan 104, 7 (2023), 60–61.
  12. Conversing with copilot: Exploring prompt engineering for solving cs1 problems using natural language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 1136–1142.
  13. CodeWrite: Supporting Student-Driven Practice of Java. In Proceedings of the 42nd ACM Technical Symposium on Computer Science Education (Dallas, TX, USA) (SIGCSE ’11). Association for Computing Machinery, New York, NY, USA, 471–476. https://doi.org/10.1145/1953163.1953299
  14. Computing Education in the Era of Generative AI. arXiv preprint arXiv:2306.02608 (2023).
  15. Augie Doebling and Ayaan M Kazerouni. 2021. Patterns of academic help-seeking in undergraduate computing students. In Proceedings of the 21st Koli Calling International Conference on Computing Education Research. 1–10.
  16. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Proceedings of the 24th Australasian Computing Education Conference (Virtual Event, Australia) (ACE ’22). Association for Computing Machinery, New York, NY, USA, 10–19. https://doi.org/10.1145/3511861.3511863
  17. My AI Wants to Know if This Will Be on the Exam: Testing OpenAI’s Codex on CS2 Programming Exercises. In Proceedings of the 25th Australasian Computing Education Conference. 97–104.
  18. Who Uses Office Hours? A Comparison of In-Person and Virtual Office Hours Utilization. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education - Volume 1 (Providence, RI, USA) (SIGCSE 2022). Association for Computing Machinery, New York, NY, USA, 300–306. https://doi.org/10.1145/3478431.3499334
  19. Philip J Guo. 2013. Online python tutor: embeddable web-based program visualization for cs education. In Proceeding of the 44th ACM technical symposium on Computer science education. 579–584.
  20. Mark Guzdial. 2023. Scaffolding to Support Humanities Students Programming in a Human Language Context. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 2 (Turku, Finland) (ITiCSE 2023). Association for Computing Machinery, New York, NY, USA, 579–580. https://doi.org/10.1145/3587103.3594157
  21. What would other programmers do: suggesting solutions to error messages. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1019–1028.
  22. Writing reusable code feedback at scale with mixed-initiative program synthesis. In Proceedings of the Fourth (2017) ACM Conference on Learning@ Scale. 89–98.
  23. Hotjar. 2023. Hotjar: Website Heatmaps & Behavior Analytics Tools. https://www.hotjar.com/. Accessed on: 30-July-2023.
  24. Jeremy Hsu. 2023. Should schools ban AI chatbots? New Scientist 257, 3422 (2023), 15. https://doi.org/10.1016/S0262-4079(23)00099-4
  25. Michelle Ichinco and Caitlin Kelleher. 2015. Exploring novice programmer example use. In 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 63–71.
  26. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences 103 (2023), 102274.
  27. Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–23.
  28. How Novices Use LLM-based Code Generators to Solve CS1 Coding Tasks in a Self-Paced Learning Environment. In Proceedings of the 23rd Koli Calling International Conference on Computing Education Research.
  29. QuickTA: Exploring the Design Space of Using Large Language Models to Provide Support to Students. Learning Analytics and Knowledge Conference 2023 (LAK’23).
  30. Sam Lau and Philip J Guo. 2023. From” Ban It Till We Understand It” to” Resistance is Futile”: How University Programming Instructors Plan to Adapt as More Students Use AI Code Generation and Explanation Tools such as ChatGPT and GitHub Copilot. In Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 1.
  31. Michael J. Lee. 2014. Gidget: An online debugging game for learning and engagement in computing education. In 2014 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 193–194. https://doi.org/10.1109/VLHCC.2014.6883051
  32. Michael J Lee and Amy J Ko. 2015. Comparing the effectiveness of online learning approaches on CS1 learning outcomes. In Proceedings of the eleventh annual international conference on international computing education research. 237–246.
  33. Comparing Code Explanations Created by Students and Large Language Models. arXiv:2304.03938 [cs.CY]
  34. Using large language models to enhance programming error messages. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 563–569.
  35. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS’20). Curran Associates Inc., Red Hook, NY, USA, Article 793, 16 pages.
  36. Towards a Framework for Teaching Debugging. In Proceedings of the Twenty-First Australasian Computing Education Conference (Sydney, NSW, Australia) (ACE ’19). Association for Computing Machinery, New York, NY, USA, 79–86. https://doi.org/10.1145/3286960.3286970
  37. CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes. arXiv:2308.06921 [cs.CY]
  38. Dastyni Loksa and Amy J Ko. 2016. The role of self-regulation in programming problem solving process and success. In Proceedings of the 2016 ACM conference on international computing education research. 83–91.
  39. Experiences from using code explanations generated by large language models in a web software development e-book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 931–937.
  40. Subgoal-labeled instructional material improves performance and transfer in learning to develop mobile applications. In Proceedings of the ninth annual international conference on International computing education research. 71–78.
  41. Debugging: a review of the literature from an educational perspective. Computer Science Education 18, 2 (2008), 67–92. https://doi.org/10.1080/08993400802114581 arXiv:https://doi.org/10.1080/08993400802114581
  42. Tilman Michaeli and Ralf Romeike. 2019. Improving Debugging Skills in the Classroom: The Effects of Teaching a Systematic Debugging Process. In Proceedings of the 14th Workshop in Primary and Secondary Computing Education (Glasgow, Scotland, Uk) (WiPSCE’19). Association for Computing Machinery, New York, NY, USA, Article 15, 7 pages. https://doi.org/10.1145/3361721.3361724
  43. Matthew B Miles and A Michael Huberman. 1994. Qualitative data analysis: An expanded sourcebook. sage.
  44. Undergraduate teaching assistants in computer science: a systematic literature review. In Proceedings of the 2019 ACM Conference on International Computing Education Research. 31–40.
  45. Ability to ’explain in Plain English’ Linked to Proficiency in Computer-Based Programming. In Proceedings of the Ninth Annual International Conference on International Computing Education Research (Auckland, New Zealand) (ICER ’12). Association for Computing Machinery, New York, NY, USA, 111–118. https://doi.org/10.1145/2361276.2361299
  46. Assessing and responding to the growth of computer science undergraduate enrollments. National Academies Press.
  47. Kimberly A Neuendorf. 2017. The content analysis guidebook. sage.
  48. OpenAI. 2022. Introducing ChatGPT. {https://openai.com/blog/chatgpt}. Accessed on: 30-July-2023.
  49. Generating High-Precision Feedback for Programming Syntax Errors using Large Language Models. arXiv preprint arXiv:2302.04662 (2023).
  50. What Do We Think We Think We Are Doing? Metacognition and Self-Regulation in Programming. In Proceedings of the 2020 ACM Conference on International Computing Education Research (Virtual Event, New Zealand) (ICER ’20). Association for Computing Machinery, New York, NY, USA, 2–13. https://doi.org/10.1145/3372782.3406263
  51. “It’s Weird That It Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. ACM Trans. Comput.-Hum. Interact. 31, 1, Article 4 (nov 2023), 31 pages. https://doi.org/10.1145/3617367
  52. Evaluating the Performance of Code Generation Models for Solving Parsons Problems With Small Prompt Variations. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (Turku, Finland) (ITiCSE 2023). Association for Computing Machinery, New York, NY, USA, 299–305. https://doi.org/10.1145/3587102.3588805
  53. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1. 27–43.
  54. Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses. In Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 1.
  55. My Digital Hand: A Tool for Scaling Up One-to-One Peer Teaching in Support of Computer Science Learning. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (Seattle, Washington, USA) (SIGCSE ’17). Association for Computing Machinery, New York, NY, USA, 549–554. https://doi.org/10.1145/3017680.3017800
  56. ”Office Hours Are Kind of Weird”: Reclaiming a Resource to Foster Student-Faculty Interaction. InSight: A Journal of Scholarly Teaching 12 (2017), 14–29.
  57. A comparative study of free self-explanations and socratic tutoring explanations for source code comprehension. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 219–225.
  58. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts. 1–7.
  59. Benefits of self-explanation in introductory programming. In Proceedings of the 46th ACM Technical Symposium on Computer Science Education. 284–289.
  60. Lev Semenovich Vygotsky and Michael Cole. 1978. Mind in society: Development of higher psychological processes. Harvard university press.
  61. Towards mutual theory of mind in human-ai interaction: How language reflects what students perceive about a virtual teaching assistant. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1–14.
  62. An Australasian Study of Reading and Comprehension Skills in Novice Programmers, Using the Bloom and SOLO Taxonomies. In Proc. of the 8th Australasian Conf. on Computing Education - Volume 52. Australian Computer Society, Inc., AUS, 243–252.
  63. A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. 1–10.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Majeed Kazemitabaar (8 papers)
  2. Runlong Ye (5 papers)
  3. Xiaoning Wang (29 papers)
  4. Austin Z. Henley (12 papers)
  5. Paul Denny (67 papers)
  6. Michelle Craig (2 papers)
  7. Tovi Grossman (23 papers)
Citations (56)
X Twitter Logo Streamline Icon: https://streamlinehq.com