Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CFlow: Supporting Semantic Flow Analysis of Students' Code in Programming Problems at Scale (2404.10089v1)

Published 15 Apr 2024 in cs.HC

Abstract: The high demand for computer science education has led to high enroLLMents, with thousands of students in many introductory courses. In such large courses, it can be overwhelmingly difficult for instructors to understand class-wide problem-solving patterns or issues, which is crucial for improving instruction and addressing important pedagogical challenges. In this paper, we propose a technique and system, CFlow, for creating understandable and navigable representations of code at scale. CFlow is able to represent thousands of code samples in a visualization that resembles a single code sample. CFlow creates scalable code representations by (1) clustering individual statements with similar semantic purposes, (2) presenting clustered statements in a way that maintains semantic relationships between statements, (3) representing the correctness of different variations as a histogram, and (4) allowing users to navigate through solutions interactively using semantic filters. With a multi-level view design, users can navigate high-level patterns, and low-level implementations. This is in contrast to prior tools that either limit their focus on isolated statements (and thus discard the surrounding context of those statements) or cluster entire code samples (which can lead to large numbers of clusters -- for example, if there are n code features and m implementations of each, there can be mn clusters). We evaluated the effectiveness of CFlow with a comparison study, found participants using CFlow spent only half the time identifying mistakes and recalled twice as many desired patterns from over 6,000 submissions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Frances E Allen. 1970. Control flow analysis. ACM Sigplan Notices 5, 7 (1970), 1–19.
  2. Amjad Altadmri and Neil CC Brown. 2015. 37 million compilations: Investigating novice programming mistakes in large-scale student data. In Proceedings of the 46th ACM technical symposium on computer science education. 522–527.
  3. Piraye Bayman and Richard E Mayer. 1983. A diagnosis of beginning programmers’ misconceptions of BASIC programming statements. Commun. ACM 26, 9 (1983), 677–679.
  4. Piraye Bayman and Richard E Mayer. 1988. Using conceptual models to teach BASIC computer programming. Journal of Educational Psychology 80, 3 (1988), 291.
  5. Programming Is Hard-Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 500–506.
  6. Mental models and computer programming. International Journal of Human-Computer Studies 40, 5 (1994), 795–811.
  7. The effect of concept mapping to enhance text comprehension and summarization. The Journal of Experimental Education 71, 1 (2002), 5–23.
  8. EdCode: Towards Personalized Support at Scale for Remote Assistance in CS Education. In 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 1–5.
  9. Towards supporting programming education at scale via live streaming. Proceedings of the ACM on Human-Computer Interaction 4, CSCW3 (2021), 1–19.
  10. Michael De Raadt. 2008. Teaching programming strategies explicitly to novice programmers. Ph. D. Dissertation. University of Southern Queensland.
  11. Alireza Ebrahimi. 1994. Novice programmer errors: Language constructs and plan composition. International Journal of Human-Computer Studies 41, 4 (1994), 457–480.
  12. Anna Eckerdal and Michael Thuné. 2005. Novice Java programmers’ conceptions of” object” and” class”, and variation theory. ACM SIGCSE Bulletin 37, 3 (2005), 89–93.
  13. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).
  14. Ann E Fleury. 1991. Parameter passing: The rules the students construct. ACM SIGCSE Bulletin 23, 1 (1991), 283–286.
  15. Can computers compare student code solutions as well as teachers?. In Proceedings of the 45th ACM technical symposium on Computer science education. 21–26.
  16. OverCode: Visualizing variation in student solutions to programming problems at scale. ACM Transactions on Computer-Human Interaction (TOCHI) 22, 2 (2015), 1–35.
  17. TRG Green. 1977. Conditional program statements and their comprehensibility to professional programmers. Journal of Occupational Psychology 50, 2 (1977), 93–109.
  18. Philip J Guo. 2015. Codeopticon: Real-time, one-to-many human tutoring for computer programming. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 599–608.
  19. Writing reusable code feedback at scale with mixed-initiative program synthesis. In Proceedings of the Fourth (2017) ACM Conference on Learning@ Scale. 89–98.
  20. Exploring the Responses of Large Language Models to Beginner Programmers’ Help Requests. arXiv preprint arXiv:2306.05715 (2023).
  21. Khe Foon Hew and Wing Sum Cheung. 2014. Students’ and instructors’ use of massive open online courses (MOOCs): Motivations and challenges. Educational research review 12 (2014), 45–58.
  22. Syntactic and functional variability of a million code submissions in a machine learning mooc. In AIED 2013 Workshops Proceedings Volume, Vol. 25. Citeseer.
  23. Identifying top Java errors for novice programmers. In Proceedings frontiers in education 35th annual conference. IEEE, T4C–T4C.
  24. Identifying student misconceptions of programming. In Proceedings of the 41st ACM technical symposium on Computer science education. 107–111.
  25. How teachers would help students to improve their code. In Proceedings of the 2019 ACM Conference on Innovation and Technology in Computer Science Education. 119–125.
  26. Natalie Kiesler and Daniel Schiffner. 2023. Large Language Models in Introductory Programming Education: ChatGPT’s Performance and Implications for Assessments. arXiv preprint arXiv:2308.08572 (2023).
  27. Comparing code explanations created by students and large language models. arXiv preprint arXiv:2304.03938 (2023).
  28. Q Vera Liao and Jennifer Wortman Vaughan. 2023. AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap. arXiv preprint arXiv:2306.01941 (2023).
  29. Addressing misconceptions about code with always-on programming visualizations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2481–2490.
  30. ReadingQuizMaker: A Human-NLP Collaborative System that Supports Instructors to Design High-Quality Reading Quiz Questions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–18.
  31. Linxiao Ma. 2007. Investigating and improving novice programmers’ mental models of programming concepts. Ph. D. Dissertation. Citeseer.
  32. Ehsan Mashhadi and Hadi Hemmati. 2021. Applying codebert for automated program repair of java simple bugs. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 505–509.
  33. Slacc: Simion-based language agnostic code clones. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 210–221.
  34. George Mathew and Kathryn T Stolee. 2021. Cross-language code search using static and dynamic analyses. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 205–217.
  35. Tanya J McGill and Simone E Volet. 1997. A conceptual framework for analyzing students’ knowledge of programming. Journal of research on Computing in Education 29, 3 (1997), 276–297.
  36. A systematic literature review on teaching and learning introductory programming in higher education. IEEE Transactions on Education 62, 2 (2018), 77–90.
  37. Cade Metz. 2021. AI can now write its own computer code. That’s good news for humans. The New York Times 9 (2021).
  38. Ethan R Mollick and Lilach Mollick. 2023. Using AI to implement effective teaching strategies in classrooms: Five strategies, including prompts. Including Prompts (March 17, 2023) (2023).
  39. Orna Muller. 2005. Pattern oriented instruction and the enhancement of analogical reasoning. In Proceedings of the first international workshop on Computing education research. 57–67.
  40. Codewebs: scalable homework search for massive open online programming courses. In Proceedings of the 23rd international conference on World wide web. 491–502.
  41. Maciej Pankiewicz and Ryan S Baker. 2023. Large Language Models (GPT) for automating feedback on programming assignments. arXiv preprint arXiv:2307.00150 (2023).
  42. Roy D Pea. 1987. User centered system design: new perspectives on human-computer interaction. Journal educational computing research 3, 1 (1987), 129–134.
  43. Andy Podgurski and Lynn Pierce. 1993. Retrieving reusable software by sampling behavior. ACM Transactions on Software Engineering and Methodology (TOSEM) 2, 3 (1993), 286–303.
  44. Yizhou Qian and James Lehman. 2017. Students’ misconceptions and other difficulties in introductory programming: A literature review. ACM Transactions on Computing Education (TOCE) 18, 1 (2017), 1–24.
  45. Interactive sankey diagrams. In IEEE Symposium on Information Visualization, 2005. INFOVIS 2005. IEEE, 233–240.
  46. Problem distributions in a CS1 course. In Proceedings of the 8th Australasian Conference on Computing Education-Volume 52. 165–173.
  47. David W Scott. 1979. On optimal and data-based histograms. Biometrika 66, 3 (1979), 605–610.
  48. Automated feedback generation for introductory programming assignments. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. 15–26.
  49. Teemu Sirkiä. 2012. Recognizing programming misconceptions-an analysis of the data collected from the uuhistle program simulation tool. (2012).
  50. Pascal and high school students: A study of errors. Journal of Educational Computing Research 2, 1 (1986), 5–23.
  51. Juha Sorva et al. 2012. Visual program simulation in introductory programming education. Aalto University.
  52. Visualizing Source-Code Evolution for Understanding Class-Wide Programming Processes. Sustainability 14, 13 (2022), 8084.
  53. PuzzleMe: Leveraging Peer Assessment for In-Class Programming Exercises. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–24.
  54. Prototransformer: A meta-learning approach to providing student feedback. arXiv preprint arXiv:2107.14035 (2021).
  55. VizProg: Identifying Misunderstandings By Visualizing Students’ Coding Progress. (2023).
  56. RunEx: Augmenting Regular-Expression Code Search with Runtime Values. In 2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 139–147.
  57. Assessing generalizability of codebert. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 425–436.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com