Assured LLM-Based Software Engineering (2402.04380v1)
Abstract: In this paper we address the following question: How can we use LLMs to improve code independently of a human, while ensuring that the improved code - does not regress the properties of the original code? - improves the original in a verifiable and measurable way? To address this question, we advocate Assured LLM-Based Software Engineering; a generate-and-test approach, inspired by Genetic Improvement. Assured LLMsE applies a series of semantic filters that discard code that fails to meet these twin guarantees. This overcomes the potential problem of LLM's propensity to hallucinate. It allows us to generate code using LLMs, independently of any human. The human plays the role only of final code reviewer, as they would do with code generated by other human engineers. This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
- WES: Agent-based User Interaction Simulation on Real Infrastructure. In GI @ ICSE 2020, Shin Yoo, Justyna Petke, Westley Weimer, and Bobby R. Bruce (Eds.). ACM, 276–284. https://doi.org/doi:10.1145/3387940.3392089 Invited Keynote.
- Software Testing Research Challenges: An Industrial Perspective. In 2023 IEEE Conference on Software Testing, Verification and Validation (ICST 2023). IEEE, 1–10.
- Automated concolic testing of smartphone apps. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, 59.
- Chris Anderson. 2012. The Model-View-Viewmodel (MVVM) design pattern. In Pro Business Applications with Silverlight 5. Springer, 461–499.
- Approximate Oracles and Synergy in Software Energy Search Spaces. IEEE Transactions on Software Engineering 45, 11 (2019), 1150–1169. https://doi.org/10.1109/TSE.2018.2827066
- Real-Time AI Systems: A Definition and An Architecture.. In IJCAI. Citeseer, 256–264.
- Large Language Models for Software Engineering: Survey and Open Problems. In ICSE Future of Software Engineering (FoSE 2023). To Appear.
- Baldur: Whole-proof generation and repair with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE 2023). 1229–1241.
- Design Patterns. Addison-Wesley.
- Automated program repair. Commun. ACM 62, 12 (2019), 56–65.
- Mark Harman. 2010. Why Source Code Analysis and Manipulation Will Always Be Important. In 10thsuperscript10𝑡ℎ10^{th}10 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT IEEE International Working Conference on Source Code Analysis and Manipulation. Timisoara, Romania, 7–19.
- Mark Harman. 2022. Scaling Genetic Improvement and Automated Program Repair (keynote paper). In 3rd IEEE/ACM International Workshop on Automated Program Repair, APR@ICSE 2022, Pittsburgh, PA, USA, May 19, 2022. IEEE, 1–7. https://doi.org/10.1145/3524459.3527353
- Dynamic Adaptive Search Based Software Engineering Needs Fast Approximate Metrics (Keynote Paper). In 4thsuperscript4𝑡ℎ4^{th}4 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT International Workshop on Emerging Trends in Software Metrics (WeTSOM 2013). San Francisco, USA.
- Achievements, open problems and challenges for search based software testing (keynote Paper). In 8thsuperscript8𝑡ℎ8^{th}8 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT IEEE International Conference on Software Testing, Verification and Validation (ICST 2015). Graz, Austria.
- Mark Harman and Bryan F. Jones. 2001. Search Based Software Engineering. Information and Software Technology 43, 14 (Dec. 2001), 833–839.
- Search Based Software Engineering: Trends, Techniques and Applications. Comput. Surveys 45, 1 (November 2012), 11:1–11:61.
- Osman Hasan and Sofiene Tahar. 2015. Formal verification methods. In Encyclopedia of Information Science and Technology, Third Edition. IGI global, 7162–7170.
- The art of multiprocessor programming. Newnes.
- Online learning: A comprehensive survey. Neurocomputing 459 (2021), 249–289.
- Large Language Models for Software Engineering: A Systematic Literature Review. arXiv prnote arXiv:2308.10620 (2023).
- Evaluating Automatic Program Repair Capabilities to Repair API Misuses. IEEE Transactions on Software Engineering 48, 7 (2022), 2658–2679. https://doi.org/10.1109/TSE.2021.3067156
- Hermann Kopetz and Wilfried Steiner. 2022. Real-Time Communication. In Real-time systems: Design principles for distributed embedded applications. Springer, 177–200.
- J. R. Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA.
- William B. Langdon and Mark Harman. 2015. Optimising Existing Software with Genetic Programming. IEEE Transactions on Evolutionary Computation (TEVC) 19, 1 (Feb 2015), 118–135.
- SEEDS: A Software Engineer’s Energy-optimization Decision Support Framework. In 36th International Conference on Software Engineering (Hyderabad, India). ACM, New York, NY, USA, 503–514.
- SapFix: Automated End-to-End Repair at Scale. In International Conference on Software Engineering (ICSE) Software Engineering in Practice (SEIP) track. Montreal, Canada.
- Maja Milankovic–Atkinson and Elli Georgiadou. 1996. Metrics for reuse of object–oriented software. In 4thsuperscript4𝑡ℎ4^{th}4 start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT Software Quality Management Conference. Cambridge.
- CodeCompose: A Large-Scale Industrial Deployment of AI-assisted Code Authoring. In Foundations of Software Engineering (FSE 2024). Porto de Galinhas, Brazil. Earlier version available as arXiv:2305.12050.
- Are developers aware of the architectural impact of their changes?. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017. 95–105.
- Genetic Improvement of Software: a Comprehensive Survey. IEEE Transactions on Evolutionary Computation 22, 3 (June 2018), 415–432. https://doi.org/doi:10.1109/TEVC.2017.2693219
- Harnessing evolution for multi-hunk program repair. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 13–24.
- A systematic literature survey of integration testing in component-based software engineering. In 2010 International Conference on Computer and Communication Technology (ICCCT). IEEE, 562–568.
- Ben Shneiderman. 1984. Response time and display rate in human performance with computers. ACM Computing Surveys (CSUR) 16, 3 (1984), 265–285.
- Genetic programming for shader simplification. ACM Transactions on Graphics 30, 6 (2011), 152:1–152:11.
- Software testing with large language model: Survey, landscape, and vision. arXiv preprint arXiv:2307.07221 (2023).
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
- Evolutionary Improvement of Programs. IEEE Transactions on Evolutionary Computation (TEVC) 15, 4 (2011), 515–538.
- Searching for Resource-Efficient Programs: Low-Power Pseudorandom Number Generators. In 2008 Genetic and Evolutionary Computation Conference (GECCO 2008). Atlanta, USA, 1775–1782.
- Deep Parameter Optimisation. In Genetic and evolutionary computation conference (GECCO 2015). Madrid, Spain, 1375–1382.
- ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629 [cs.CL]
- Shuyin Zhao. [n. d.]. GitHub Copilot now has a better AI model and new capabilities. https://github.blog/2023-02-14-github-copilot-now-has-a-better-ai-model-and-new-capabilities/