Validating AI-Generated Code with Live Programming (2306.09541v3)
Abstract: AI-powered programming assistants are increasingly gaining popularity, with GitHub Copilot alone used by over a million developers worldwide. These tools are far from perfect, however, producing code suggestions that may be incorrect in subtle ways. As a result, developers face a new challenge: validating AI's suggestions. This paper explores whether Live Programming (LP), a continuous display of a program's runtime values, can help address this challenge. To answer this question, we built a Python editor that combines an AI-powered programming assistant with an existing LP environment. Using this environment in a between-subjects study (N=17), we found that by lowering the cost of validation by execution, LP can mitigate over- and under-reliance on AI-generated programs and reduce the cognitive load of validation for certain types of tasks.
- Amazon. 2023. CodeWhisperer. https://aws.amazon.com/codewhisperer/.
- Grounded Copilot: How Programmers Interact with Code-Generating Models. Proc. ACM Program. Lang. 7, OOPSLA1, Article 78 (apr 2023), 27 pages. https://doi.org/10.1145/3586030
- Taking Flight with Copilot: Early Insights and Opportunities of AI-Powered Pair-Programming Tools. Queue 20, 6 (Dec. 2022), 35–57. https://doi.org/10.1145/3582083
- Programs in the Palm of your Hand: How Live Programming Shapes Children’s Interactions with Physical Computing Devices. In Proceedings of the 18th ACM International Conference on Interaction Design and Children. ACM, Boise, ID, USA, 227–236. https://doi.org/10.1145/3311927.3323138
- Does Live Programming Help Program Comprehension? – A user study with Live Robot Programming. In Proceedings of the 7th International Workshop on Evaluation and Usability of Programming Languages and Tools. ACM, Amsterdam, Netherlands, 1–8. http://bergel.eu/MyPapers/Camp16-ComprehensionWithLRP.pdf
- Kathy Charmaz. 2014. Constructing Grounded Theory. sage.
- Robert DeLine and Danyel Fisher. 2015. Supporting Exploratory Data Analysis with Live Programming. In 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, Atlanta, GA, 111–119. https://doi.org/10.1109/VLHCC.2015.7357205
- Robert A DeLine. 2021. Glinda: Supporting Data Science with Live Programming, GUIs and a Domain-specific Language. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama, Japan, 1–11. https://doi.org/10.1145/3411764.3445267
- Wrex: A Unified Programming-by-Example Interaction for Synthesizing Readable Code for Data Scientists. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu, HI, USA, 1–12. https://doi.org/10.1145/3313831.3376442
- Jimmy Efird. 2011. Blocked Randomization with Randomly Selected Block Sizes. International Journal of Environmental Research and Public Health 8, 1 (2011), 15–20. https://www.mdpi.com/1660-4601/8/1/15
- Small-Step Live Programming by Example. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA). Association for Computing Machinery, New York, NY, USA, 614–626. https://doi.org/10.1145/3379337.3415869
- GitHub. 2023. GitHub Copilot - Your AI Pair Programmer. https://copilot.github.com/.
- Christopher Michael Hancock. 2003. Real-Time Programming and the Big Ideas of Computational Literacy. Ph. D. Dissertation. Massachusetts Institute of Technology. https://dspace.mit.edu/handle/1721.1/61549
- Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Advances in Psychology. Vol. 52. Elsevier, 139–183. https://doi.org/10.1016/S0166-4115(08)62386-9
- Investigating the Impact of Using a Live Programming Environment in a CS1 Course. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 1 (SIGCSE 2022). Association for Computing Machinery, New York, NY, USA, 495–501. https://doi.org/10.1145/3478431.3499305
- Digging for Fold: Synthesis-Aided API Discovery for Haskell. Proc. ACM Program. Lang. 4, OOPSLA, Article 205 (November 2020), 27 pages. https://doi.org/10.1145/3428273
- Hyeonsu Kang and Philip J. Guo. 2017. Omnicode: A Novice-Oriented Live Programming Environment with Always-On Run-Time Value Visualizations. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. ACM, Québec City, QC, Canada, 737–745. https://doi.org/10.1145/3126594.3126632
- How Live Coding Affects Developers’ Coding Behavior. In 2014 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 5–8. https://doi.org/10.1109/VLHCC.2014.6883013 ISSN: 1943-6106.
- Interactive Code Generation via Test-Driven User-Intent Formalization. https://doi.org/10.48550/arXiv.2208.05950 arXiv:2208.05950 [cs]
- Sorin Lerner. 2020. Projection Boxes: On-the-Fly Reconfigurable Visualization for Live Programming. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3313831.3376494
- Understanding the Usability of AI Programming Assistants. arXiv:2303.17125 [cs.SE] https://doi.org/10.48550/arXiv.2303.17125
- Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming. arXiv:2210.14306 [cs.SE] https://doi.org/10.48550/arXiv.2210.14306
- OpenAI. 2023a. ChatGPT. https://chat.openai.com/.
- OpenAI. 2023b. GPT-3.5. https://platform.openai.com/docs/models/gpt-3-5.
- Programming with a Read-Eval-Synth Loop. Proc. ACM Program. Lang. 4, OOPSLA, Article 159 (November 2020), 30 pages. https://doi.org/10.1145/3428227
- Do Users Write More Insecure Code with AI Assistants? arXiv:2211.03622 [cs.CR] https://doi.org/10.48550/arXiv.2211.03622
- The Programmer’s Assistant: Conversational Interaction with a Large Language Model for Software Development. https://doi.org/10.1145/3581641.3584037 arXiv:2302.07080 [cs].
- Live Feedback on Behavioral Changes. In 2013 1st International Workshop on Live Programming (LIVE). 23–26. https://doi.org/10.1109/LIVE.2013.6617344
- Beatriz Souza and Michael Pradel. 2023. LExecutor: Learning-Guided Execution. https://doi.org/10.48550/arXiv.2302.02343 arXiv:2302.02343 [cs]
- An Empirical Study of Developer Behaviors for Validating and Repairing AI-Generated Code. (March 2023). https://doi.org/10.1184/R1/22223533.v1
- Steven L Tanimoto. 2013. A Perspective on the Evolution of Live Programming. In 2013 1st International Workshop on Live Programming (LIVE). IEEE, 31–34. https://doi.org/10.1109/LIVE.2013.6617346
- Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA). Association for Computing Machinery, New York, NY, USA, Article 332, 7 pages. https://doi.org/10.1145/3491101.3519665
- Generation Probabilities Are Not Enough: Improving Error Highlighting for AI Code Suggestions. In Virtual Workshop on Human-Centered AI Workshop at NeurIPS (HCAI NeurIPS’22). Virtual Event, USA. 1–4. https://www.microsoft.com/en-us/research/uploads/prod/2022/10/Helena_s_Project.pdf
- Explanations Can Reduce Overreliance on AI Systems During Decision-Making. http://arxiv.org/abs/2212.06823 arXiv:2212.06823 [cs].
- Bret Victor. 2012. Learnable Programming. http://worrydream.com/LearnableProgramming/
- Investigating and Designing for Trust in AI-powered Code Generation Tools. arXiv preprint arXiv:2305.11248 (2023). https://doi.org/10.48550/arXiv.2305.11248
- Perfection Not Required? Human-AI Partnerships in Code Translation. In 26th International Conference on Intelligent User Interfaces. Association for Computing Machinery, New York, NY, USA, 402–412. https://doi.org/10.1145/3397481.3450656
- Mark Wilson-Thomas. 2023. Simplified Code Refinement and Debugging with GitHub Copilot Chat. https://devblogs.microsoft.com/visualstudio/simplified-code-refinement-and-debugging-with-github-copilot-chat/
- Dvora Yanow. 2017. Qualitative-Interpretive Methods in Policy Research. In Handbook of Public Policy Analysis. Routledge, 431–442.
- Interactive Program Synthesis by Augmented Examples. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA, 627–648. https://doi.org/10.1145/3379337.3415900
- ODEN: Live Programming for Neural Network Architecture Editing. In 27th International Conference on Intelligent User Interfaces. Association for Computing Machinery, New York, NY, USA, 392–404. https://doi.org/10.1145/3490099.3511120
- Kasra Ferdowsi (2 papers)
- Ruanqianqian Huang (4 papers)
- Michael B. James (2 papers)
- Nadia Polikarpova (24 papers)
- Sorin Lerner (16 papers)