Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Validating AI-Generated Code with Live Programming (2306.09541v3)

Published 15 Jun 2023 in cs.HC and cs.PL

Abstract: AI-powered programming assistants are increasingly gaining popularity, with GitHub Copilot alone used by over a million developers worldwide. These tools are far from perfect, however, producing code suggestions that may be incorrect in subtle ways. As a result, developers face a new challenge: validating AI's suggestions. This paper explores whether Live Programming (LP), a continuous display of a program's runtime values, can help address this challenge. To answer this question, we built a Python editor that combines an AI-powered programming assistant with an existing LP environment. Using this environment in a between-subjects study (N=17), we found that by lowering the cost of validation by execution, LP can mitigate over- and under-reliance on AI-generated programs and reduce the cognitive load of validation for certain types of tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Amazon. 2023. CodeWhisperer. https://aws.amazon.com/codewhisperer/.
  2. Grounded Copilot: How Programmers Interact with Code-Generating Models. Proc. ACM Program. Lang. 7, OOPSLA1, Article 78 (apr 2023), 27 pages. https://doi.org/10.1145/3586030
  3. Taking Flight with Copilot: Early Insights and Opportunities of AI-Powered Pair-Programming Tools. Queue 20, 6 (Dec. 2022), 35–57. https://doi.org/10.1145/3582083
  4. Programs in the Palm of your Hand: How Live Programming Shapes Children’s Interactions with Physical Computing Devices. In Proceedings of the 18th ACM International Conference on Interaction Design and Children. ACM, Boise, ID, USA, 227–236. https://doi.org/10.1145/3311927.3323138
  5. Does Live Programming Help Program Comprehension? – A user study with Live Robot Programming. In Proceedings of the 7th International Workshop on Evaluation and Usability of Programming Languages and Tools. ACM, Amsterdam, Netherlands, 1–8. http://bergel.eu/MyPapers/Camp16-ComprehensionWithLRP.pdf
  6. Kathy Charmaz. 2014. Constructing Grounded Theory. sage.
  7. Robert DeLine and Danyel Fisher. 2015. Supporting Exploratory Data Analysis with Live Programming. In 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, Atlanta, GA, 111–119. https://doi.org/10.1109/VLHCC.2015.7357205
  8. Robert A DeLine. 2021. Glinda: Supporting Data Science with Live Programming, GUIs and a Domain-specific Language. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. ACM, Yokohama, Japan, 1–11. https://doi.org/10.1145/3411764.3445267
  9. Wrex: A Unified Programming-by-Example Interaction for Synthesizing Readable Code for Data Scientists. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu, HI, USA, 1–12. https://doi.org/10.1145/3313831.3376442
  10. Jimmy Efird. 2011. Blocked Randomization with Randomly Selected Block Sizes. International Journal of Environmental Research and Public Health 8, 1 (2011), 15–20. https://www.mdpi.com/1660-4601/8/1/15
  11. Small-Step Live Programming by Example. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA). Association for Computing Machinery, New York, NY, USA, 614–626. https://doi.org/10.1145/3379337.3415869
  12. GitHub. 2023. GitHub Copilot - Your AI Pair Programmer. https://copilot.github.com/.
  13. Christopher Michael Hancock. 2003. Real-Time Programming and the Big Ideas of Computational Literacy. Ph. D. Dissertation. Massachusetts Institute of Technology. https://dspace.mit.edu/handle/1721.1/61549
  14. Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Advances in Psychology. Vol. 52. Elsevier, 139–183. https://doi.org/10.1016/S0166-4115(08)62386-9
  15. Investigating the Impact of Using a Live Programming Environment in a CS1 Course. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 1 (SIGCSE 2022). Association for Computing Machinery, New York, NY, USA, 495–501. https://doi.org/10.1145/3478431.3499305
  16. Digging for Fold: Synthesis-Aided API Discovery for Haskell. Proc. ACM Program. Lang. 4, OOPSLA, Article 205 (November 2020), 27 pages. https://doi.org/10.1145/3428273
  17. Hyeonsu Kang and Philip J. Guo. 2017. Omnicode: A Novice-Oriented Live Programming Environment with Always-On Run-Time Value Visualizations. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology. ACM, Québec City, QC, Canada, 737–745. https://doi.org/10.1145/3126594.3126632
  18. How Live Coding Affects Developers’ Coding Behavior. In 2014 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 5–8. https://doi.org/10.1109/VLHCC.2014.6883013 ISSN: 1943-6106.
  19. Interactive Code Generation via Test-Driven User-Intent Formalization. https://doi.org/10.48550/arXiv.2208.05950 arXiv:2208.05950 [cs]
  20. Sorin Lerner. 2020. Projection Boxes: On-the-Fly Reconfigurable Visualization for Live Programming. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA). Association for Computing Machinery, New York, NY, USA, 1–7. https://doi.org/10.1145/3313831.3376494
  21. Understanding the Usability of AI Programming Assistants. arXiv:2303.17125 [cs.SE] https://doi.org/10.48550/arXiv.2303.17125
  22. Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming. arXiv:2210.14306 [cs.SE] https://doi.org/10.48550/arXiv.2210.14306
  23. OpenAI. 2023a. ChatGPT. https://chat.openai.com/.
  24. OpenAI. 2023b. GPT-3.5. https://platform.openai.com/docs/models/gpt-3-5.
  25. Programming with a Read-Eval-Synth Loop. Proc. ACM Program. Lang. 4, OOPSLA, Article 159 (November 2020), 30 pages. https://doi.org/10.1145/3428227
  26. Do Users Write More Insecure Code with AI Assistants? arXiv:2211.03622 [cs.CR] https://doi.org/10.48550/arXiv.2211.03622
  27. The Programmer’s Assistant: Conversational Interaction with a Large Language Model for Software Development. https://doi.org/10.1145/3581641.3584037 arXiv:2302.07080 [cs].
  28. Live Feedback on Behavioral Changes. In 2013 1st International Workshop on Live Programming (LIVE). 23–26. https://doi.org/10.1109/LIVE.2013.6617344
  29. Beatriz Souza and Michael Pradel. 2023. LExecutor: Learning-Guided Execution. https://doi.org/10.48550/arXiv.2302.02343 arXiv:2302.02343 [cs]
  30. An Empirical Study of Developer Behaviors for Validating and Repairing AI-Generated Code. (March 2023). https://doi.org/10.1184/R1/22223533.v1
  31. Steven L Tanimoto. 2013. A Perspective on the Evolution of Live Programming. In 2013 1st International Workshop on Live Programming (LIVE). IEEE, 31–34. https://doi.org/10.1109/LIVE.2013.6617346
  32. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA). Association for Computing Machinery, New York, NY, USA, Article 332, 7 pages. https://doi.org/10.1145/3491101.3519665
  33. Generation Probabilities Are Not Enough: Improving Error Highlighting for AI Code Suggestions. In Virtual Workshop on Human-Centered AI Workshop at NeurIPS (HCAI NeurIPS’22). Virtual Event, USA. 1–4. https://www.microsoft.com/en-us/research/uploads/prod/2022/10/Helena_s_Project.pdf
  34. Explanations Can Reduce Overreliance on AI Systems During Decision-Making. http://arxiv.org/abs/2212.06823 arXiv:2212.06823 [cs].
  35. Bret Victor. 2012. Learnable Programming. http://worrydream.com/LearnableProgramming/
  36. Investigating and Designing for Trust in AI-powered Code Generation Tools. arXiv preprint arXiv:2305.11248 (2023). https://doi.org/10.48550/arXiv.2305.11248
  37. Perfection Not Required? Human-AI Partnerships in Code Translation. In 26th International Conference on Intelligent User Interfaces. Association for Computing Machinery, New York, NY, USA, 402–412. https://doi.org/10.1145/3397481.3450656
  38. Mark Wilson-Thomas. 2023. Simplified Code Refinement and Debugging with GitHub Copilot Chat. https://devblogs.microsoft.com/visualstudio/simplified-code-refinement-and-debugging-with-github-copilot-chat/
  39. Dvora Yanow. 2017. Qualitative-Interpretive Methods in Policy Research. In Handbook of Public Policy Analysis. Routledge, 431–442.
  40. Interactive Program Synthesis by Augmented Examples. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, NY, USA, 627–648. https://doi.org/10.1145/3379337.3415900
  41. ODEN: Live Programming for Neural Network Architecture Editing. In 27th International Conference on Intelligent User Interfaces. Association for Computing Machinery, New York, NY, USA, 392–404. https://doi.org/10.1145/3490099.3511120
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Kasra Ferdowsi (2 papers)
  2. Ruanqianqian Huang (4 papers)
  3. Michael B. James (2 papers)
  4. Nadia Polikarpova (24 papers)
  5. Sorin Lerner (16 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets