Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

IntelliExplain: Enhancing Conversational Code Generation for Non-Professional Programmers (2405.10250v3)

Published 16 May 2024 in cs.HC

Abstract: Chat LLMs such as GPT-3.5-turbo and GPT-4 have shown promise in assisting humans in coding, particularly by enabling them to conversationally provide feedback. However, current approaches assume users have expert debugging skills, limiting accessibility for non-professional programmers. In this paper, we first explore Chat LLMs' limitations in assisting non-professional programmers with coding. Through a formative study, we identify two key elements affecting their experience: the way a Chat LLM explains its generated code and the structure of human-LLM interaction. We then propose IntelliExplain, a new conversational code generation framework with enhanced code explanations and a structured interaction paradigm, which enforces both better code understanding and a more effective feedback loop. In two programming tasks (SQL and Python), IntelliExplain yields significantly higher success rates and reduces task time compared to the vanilla Chat LLM. We also identify several opportunities that remain in effectively offering a chat-based programming experience for non-professional programmers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
  2. Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021).
  3. Grounded copilot: How programmers interact with code-generating models. Proceedings of the ACM on Programming Languages 7, OOPSLA1 (2023), 85–111.
  4. Promptify: Text-to-image generation through interactive prompt exploration with large language models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–14.
  5. Shobhit Chaurasia and Raymond J. Mooney. 2017. Dialog for Language to Code. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Greg Kondrak and Taro Watanabe (Eds.). Asian Federation of Natural Language Processing, Taipei, Taiwan, 175–180. https://aclanthology.org/I17-2030
  6. Improving code generation by training with natural language feedback. arXiv preprint arXiv:2303.16749 (2023).
  7. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  8. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128 (2023).
  9. Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).
  10. Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic? arXiv preprint arXiv:2210.14699 (2022).
  11. Wrex: A unified programming-by-example interaction for synthesizing readable code for data scientists. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1–12.
  12. Speak to your Parser: Interactive Text-to-SQL with Natural Language Feedback. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational Linguistics, Online, 2065–2077. https://doi.org/10.18653/v1/2020.acl-main.187
  13. GitHub. 2021. Copilot. https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/.
  14. DialSQL: Dialogue Based Structured Query Generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, Melbourne, Australia, 1339–1349. https://doi.org/10.18653/v1/P18-1124
  15. Exploring the learnability of program synthesizers by novice programmers. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–15.
  16. Discovering the syntax and strategies of natural language programming with generative language models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19.
  17. Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–23.
  18. Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
  19. Learning to Learn Semantic Parsers from Natural Language Supervision. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, Brussels, Belgium, 1676–1690. https://doi.org/10.18653/v1/D18-1195
  20. Interactive code generation via test-driven user-intent formalization. arXiv preprint arXiv:2208.05950 (2022).
  21. Can language models learn from explanations in context?. In Findings of the Association for Computational Linguistics: EMNLP 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 537–563. https://doi.org/10.18653/v1/2022.findings-emnlp.38
  22. StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023).
  23. “What Do You Mean by That?” A Parser-Independent Interactive Approach for Enhancing Text-to-SQL. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 6913–6922. https://doi.org/10.18653/v1/2020.emnlp-main.561
  24. “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–31.
  25. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1–35.
  26. Experiences from using code explanations generated by large language models in a web software development e-book. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 931–937.
  27. On the design of ai-powered code assistants for notebooks. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–16.
  28. Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction. In Findings of the Association for Computational Linguistics: ACL 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 322–342. https://doi.org/10.18653/v1/2022.findings-acl.28
  29. OpenAI. 2023. ChatGPT. https://openai.com.
  30. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  31. Explaining large language model-based neural semantic parsers (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 16308–16309.
  32. The programmer’s assistant: Conversational interaction with a large language model for software development. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 491–514.
  33. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).
  34. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1. 27–43.
  35. Michael Staniek and Stefan Riezler. 2021. Error-Aware Interactive Semantic Parsing of OpenStreetMap. In Proceedings of Second International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics. 53–59.
  36. Learning to summarize with human feedback. Advances in Neural Information Processing Systems 33 (2020), 3008–3021.
  37. Natural language interfaces with fine-grained user interaction: A case study on web apis. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 855–864.
  38. Towards More Effective AI-Assisted Programming: A Systematic Design Exploration to Improve Visual Studio Intelli-Code’s User Experience. In Proceedings of the IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP’23). Association for Computing Machinery, New York, NY, USA.
  39. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts. 1–7.
  40. LeTI: Learning to Generate from Textual Interactions. arXiv preprint arXiv:2305.10314 (2023).
  41. Compilable Neural Code Generation with Compiler Feedback. In Findings of the Association for Computational Linguistics: ACL 2022. 9–19.
  42. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021).
  43. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  44. In-ide code generation from natural language: Promise and challenges. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 2 (2022), 1–47.
  45. InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback. arXiv preprint arXiv:2306.14898 (2023).
  46. Interactive semantic parsing for if-then recipes via hierarchical reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 2547–2554.
  47. Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, Hong Kong, China, 5447–5458. https://doi.org/10.18653/v1/D19-1547
  48. An Imitation Game for Learning Semantic Parsers from User Interaction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 6883–6902. https://doi.org/10.18653/v1/2020.emnlp-main.559
  49. Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, Brussels, Belgium, 3911–3921. https://doi.org/10.18653/v1/D18-1425
  50. Interactive program synthesis by augmented examples. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 627–648.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Hao Yan (109 papers)
  2. Ziyu Yao (44 papers)
  3. Thomas D. LaToza (17 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets