A Deep Dive into Large Language Models for Automated Bug Localization and Repair (2404.11595v3)
Abstract: LLMs have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR). In this study, we take a deep dive into automated bug fixing utilizing LLMs. In contrast to many deep learning-based APR methods that assume known bug locations, rely on line-level localization tools, or address bug prediction and fixing in one step, our approach uniquely employs LLMs to predict bug location at the token level and subsequently utilizes them for bug fixing. This methodological separation of bug localization and fixing using different LLMs enables effective integration of diverse contextual information and improved incorporation of inductive biases. We introduce Toggle: Token-Granulated Bug Localization and Repair, a comprehensive program repair framework that integrates a bug localization model, an adjustment unit, and a bug-fixing model. Toggle takes a buggy function as input and generates a complete corrected function. We investigate various styles of prompting to the bug fixing model to identify the most effective prompts that better utilize the inductive bias and significantly outperform others. Toggle achieves the new state-of-the-art (SOTA) performance on the CodeXGLUE code refinement benchmark, and exhibits better and comparable performance on several other widely-used APR datasets, including Defects4J.
- On the accuracy of spectrum-based fault localization. In Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007), pages 89–98, 2007.
- Codit: Code editing with tree-based neural models. IEEE Transactions on Software Engineering, 48(4):1385–1399, 2022.
- Improving code generation by training with natural language feedback, 2023.
- Codet: Code generation with generated tests, 2022.
- Sequencer: Sequence-to-sequence learning for end-to-end program repair. IEEE Transactions on Software Engineering, 2019.
- Seqtrans: Automatic vulnerability fix via sequence to sequence learning, 2022.
- IvySyn: Automated vulnerability discovery in deep learning frameworks. In 32nd USENIX Security Symposium (USENIX Security 23), pages 2383–2400, Anaheim, CA, August 2023. USENIX Association.
- CodeParrot. https://github.com/huggingface/transformers/tree/main/examples/research_projects/codeparrot.
- Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, page 423–435, New York, NY, USA, 2023. Association for Computing Machinery.
- Hoppity: Learning graph transformations to detect and fix bugs in programs. In International Conference on Learning Representations, 2020.
- Codebert: A pre-trained model for programming and natural languages. In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL, pages 1536–1547. Association for Computational Linguistics, 2020.
- Automated program repair. Commun. ACM, 62(12):56–65, 2019.
- A retrieve-and-edit framework for predicting structured outputs. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 10073–10083, 2018.
- Fix-filter-fix: Intuitively connect any models for effective bug fixing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3495–3504, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
- Neural-based test oracle generation: A large-scale evaluation and lessons learned. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, page 120–132, New York, NY, USA, 2023. Association for Computing Machinery.
- Fix bugs with transformer through a neural-symbolic edit grammar. arXiv preprint arXiv:2204.06643, 2022.
- Impact of code language models on automated program repair. In Proceedings of the 45th International Conference on Software Engineering, ICSE ’23, page 1430–1442. IEEE Press, 2023.
- Knod: Domain knowledge distilled tree decoder for automated program repair. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1251–1263, 2023.
- Cure: Code-aware neural machine translation for automatic program repair. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 1161–1173, 2021.
- Defects4J: A Database of existing faults to enable controlled testing studies for Java programs. In ISSTA 2014, Proceedings of the 2014 International Symposium on Software Testing and Analysis, pages 437–440, San Jose, CA, USA, July 2014. Tool demo.
- Coderl: Mastering code generation through pretrained models and deep reinforcement learning. arXiv preprint, abs/2207.01780, 2022.
- CodeXGLUE Leaderboard. https://microsoft.github.io/codexglue/, 2023. Accessed: 2023-09-27.
- DLFix: Context-Based Code Transformation Learning for Automated Program Repair. In ICSE, page 602–614. ACM, 2020.
- Codereviewer: Pre-training for automating code review activities. arXiv preprint arXiv:2203.09095, 2022.
- Rltf: Reinforcement learning from unit test feedback, 2023.
- Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664, 2021.
- Coconut: Combining context-aware neural translation models using ensemble for program repair. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2020, page 101–114, New York, NY, USA, 2020. Association for Computing Machinery.
- Template-based neural program repair. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1456–1468, 2023.
- Martin Monperrus. Automatic software repair: A bibliography. ACM Comput. Surv., 51(1):17:1–17:24, 2018.
- Martin Monperrus. The living review on automated program repair. 2020.
- Codegen: An open large language model for code with multi-turn program synthesis, 2023.
- Copy that! editing sequences by copying spans, 2020.
- Cotext: Multi-task learning with code-text transformer. arXiv preprint arXiv:2105.08645, 2021.
- An empirical study of deep learning models for vulnerability detection. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 2237–2248, 2023.
- Learning to fix build errors with graph2diff neural networks. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops, pages 19–20, 2020.
- An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology (TOSEM), 28(4):1–29, 2019.
- Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 8696–8708. Association for Computational Linguistics, 2021.
- How effective are neural networks for fixing security vulnerabilities. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, page 1282–1294, New York, NY, USA, 2023. Association for Computing Machinery.
- Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering, ICSE ’23, page 1482–1494. IEEE Press, 2023.
- Less training, more repairing please: Revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, page 959–971, New York, NY, USA, 2022. Association for Computing Machinery.
- Impact of large language models on generating software specifications, 2023.
- A systematic evaluation of large language models of code, 2022.
- Learning structural edits via incremental tree transformations. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Neural program repair with execution-based backpropagation. In Proceedings of the International Conference on Software Engineering, 2022.
- Learning to represent edits. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- A syntax-guided edit decoder for neural program repair. In Diomidis Spinellis, Georgios Gousios, Marsha Chechik, and Massimiliano Di Penta, editors, ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021, pages 341–353. ACM, 2021.
- Tare: Type-aware neural program repair. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), pages 1443–1455, 2023.
- Soneya Binta Hossain (7 papers)
- Nan Jiang (210 papers)
- Qiang Zhou (124 papers)
- Xiaopeng Li (166 papers)
- Wen-Hao Chiang (5 papers)
- Yingjun Lyu (3 papers)
- Hoan Nguyen (14 papers)
- Omer Tripp (6 papers)