Experimenting a New Programming Practice with LLMs (2401.01062v1)
Abstract: The recent development on LLMs makes automatically constructing small programs possible. It thus has the potential to free software engineers from low-level coding and allow us to focus on the perhaps more interesting parts of software development, such as requirement engineering and system testing. In this project, we develop a prototype named AISD (AI-aided Software Development), which is capable of taking high-level (potentially vague) user requirements as inputs, generates detailed use cases, prototype system designs, and subsequently system implementation. Different from existing attempts, AISD is designed to keep the user in the loop, i.e., by repeatedly taking user feedback on use cases, high-level system designs, and prototype implementations through system testing. AISD has been evaluated with a novel benchmark of non-trivial software projects. The experimental results suggest that it might be possible to imagine a future where software engineering is reduced to requirement engineering and system testing only.
- AISD. 2023. https://drive.google.com/drive/folders/1i0UWqy1K4WwaCLnb7yhyQfV8UqjdXSkl?usp=sharing. Accessed Dec 15, 2023.
- Miltiadis Allamanis and Charles Sutton. 2014. Mining idioms from source code. In Proceedings of the 22nd acm sigsoft international symposium on foundations of software engineering. 472–483.
- Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021).
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research 24, 240 (2023), 1–113.
- A survey for in-context learning. arXiv preprint arXiv:2301.00234 (2022).
- A. Ferrari and A. Sangiovanni-Vincentelli. 1999. System design: traditional concepts and new paradigms. In Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040). 2–12. https://doi.org/10.1109/ICCD.1999.808256
- Luciano Floridi and Massimo Chiriatti. 2020. GPT-3: Its nature, scope, limits, and consequences. Minds and Machines 30 (2020), 681–694.
- Geeksforgeeks. 2023. https://www.geeksforgeeks.org/track-objects-with-camshift-using-opencv/?ref=lbp. Accessed Dec 15, 2023.
- Github. 2022. https://https://github.com/CharlesPikachu/Games/tree/master. Accessed Dec 15, 2023.
- Github. 2023. https://github.com/OpenBMB/ChatDev/blob/main/Contribution.md. Accessed Dec 15, 2023.
- Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352 (2023).
- Oracle-guided component-based program synthesis. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. 215–224.
- Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, Vol. 1. 2.
- Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
- Better zero-shot reasoning with role-play prompting. arXiv preprint arXiv:2308.07702 (2023).
- Camel: Communicative agents for” mind” exploration of large scale language model society. arXiv preprint arXiv:2303.17760 (2023).
- A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems (2021).
- Latent Predictor Networks for Code Generation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Katrin Erk and Noah A. Smith (Eds.). Association for Computational Linguistics, Berlin, Germany, 599–609. https://doi.org/10.18653/v1/P16-1057
- A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015).
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. Comput. Surveys 55, 9 (2023), 1–35.
- L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models. arXiv preprint arXiv:2309.17446 (2023).
- Communicative agents for software development. arXiv preprint arXiv:2307.07924 (2023).
- Reasoning with language model prompting: A survey. arXiv preprint arXiv:2212.09597 (2022).
- Pre-trained models for natural language processing: A survey. Science China Technological Sciences 63, 10 (2020), 1872–1897.
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN conference on programming language design and implementation. 419–428.
- Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).
- A grammar-based structural cnn decoder for code generation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 7055–7062.
- Software Engineering Using Autonomous Agents: Are We There Yet?. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE Computer Society, 1855–1857.
- Is ChatGPT the Ultimate Programming Assistant–How far is it? arXiv preprint arXiv:2304.11938 (2023).
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- A survey on large language model based autonomous agents. arXiv preprint arXiv:2308.11432 (2023).
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
- Evaluating large language models at evaluating instruction following. arXiv preprint arXiv:2310.07641 (2023).
- A survey of large language models. arXiv preprint arXiv:2303.18223 (2023).
- Emergent abilities of large language models. TMLR (2022).