AgentFL: Scaling LLM-based Fault Localization to Project-Level Context (2403.16362v1)
Abstract: Fault Localization (FL) is an essential step during the debugging process. With the strong capabilities of code comprehension, the recent LLMs have demonstrated promising performance in diagnosing bugs in the code. Nevertheless, due to LLMs' limited performance in handling long contexts, existing LLM-based fault localization remains on localizing bugs within a small code scope (i.e., a method or a class), which struggles to diagnose bugs for a large code scope (i.e., an entire software system). To address the limitation, this paper presents AgentFL, a multi-agent system based on ChatGPT for automated fault localization. By simulating the behavior of a human developer, AgentFL models the FL task as a three-step process, which involves comprehension, navigation, and confirmation. Within each step, AgentFL hires agents with diversified expertise, each of which utilizes different tools to handle specific tasks. Particularly, we adopt a series of auxiliary strategies such as Test Behavior Tracking, Document-Guided Search, and Multi-Round Dialogue to overcome the challenges in each step. The evaluation on the widely used Defects4J-V1.2.0 benchmark shows that AgentFL can localize 157 out of 395 bugs within Top-1, which outperforms the other LLM-based approaches and exhibits complementarity to the state-of-the-art learning-based techniques. Additionally, we confirm the indispensability of the components in AgentFL with the ablation study and demonstrate the usability of AgentFL through a user study. Finally, the cost analysis shows that AgentFL spends an average of only 0.074 dollars and 97 seconds for a single bug.
- A. Alaboudi and T. D. LaToza, “An exploratory study of debugging episodes,” arXiv:2105.02162 [cs.SE], 2021.
- R. Abreu, P. Zoeteweij, and A. J. van Gemund, “Spectrum-based multiple fault localization,” in 2009 IEEE/ACM International Conference on Automated Software Engineering, 2009, pp. 88–99.
- M. Raselimo and B. Fischer, “Spectrum-based fault localization for context-free grammars,” in Proceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering, ser. SLE 2019. New York, NY, USA: Association for Computing Machinery, 2019, p. 15–28.
- S. Reis, R. Abreu, and M. D’Amorim, “Demystifying the combination of dynamic slicing and spectrum-based fault localization,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence, ser. IJCAI’19. AAAI Press, 2019, p. 4760–4766.
- W. E. Wong, V. Debroy, R. Gao, and Y. Li, “The dstar method for effective software fault localization,” IEEE Transactions on Reliability, vol. 63, no. 1, pp. 290–308, 2014.
- L. Zhang, M. Kim, and S. Khurshid, “Localizing failure-inducing program edits based on spectrum information,” in 2011 27th IEEE International Conference on Software Maintenance (ICSM), 2011, pp. 23–32.
- W. Zheng, D. Hu, and J. Wang, “Fault Localization Analysis Based on Deep Neural Network,” Mathematical Problems in Engineering, vol. 2016, pp. 1–11, April 2016.
- J. Sohn and S. Yoo, “Fluccs: Using code and change metrics to improve fault localization,” in Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2017. New York, NY, USA: Association for Computing Machinery, 2017, p. 273–283.
- Z. Zhang, Y. Lei, X. Mao, and P. Li, “Cnn-fl: An effective approach for localizing faults using convolutional neural networks,” in 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2019, pp. 445–455.
- X. Li, W. Li, Y. Zhang, and L. Zhang, “Deepfl: Integrating multiple fault diagnosis dimensions for deep fault localization,” in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019, pp. 169–180.
- Y. Lou, Q. Zhu, J. Dong, X. Li, Z. Sun, D. Hao, L. Zhang, and L. Zhang, “Boosting coverage-based fault localization via graph-based representation learning,” in Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2021. New York, NY, USA: Association for Computing Machinery, 2021, p. 664–676.
- X. Pu, M. Gao, and X. Wan, “Summarization is (almost) dead,” arXiv:2309.09558 [cs.CL], 2023.
- J. Li, S. Tworkowski, Y. Wu, and R. Mooney, “Explaining competitive-level programming solutions using llms,” in The 61st Annual Meeting Of The Association For Computational Linguistics, 2023.
- Z. Yuan, J. Liu, Q. Zi, M. Liu, X. Peng, and Y. Lou, “Evaluating instruction-tuned large language models on code comprehension and generation,” arXiv preprint arXiv:2308.01240, 2023.
- Y. Wu, Z. Li, J. M. Zhang, M. Papadakis, M. Harman, and Y. Liu, “Large language models in fault localisation,” arXiv:2308.15276 [cs.SE], 2023.
- A. H. Yang, R. Martins, C. L. Goues, and V. J. Hellendoorn, “Large language models for test-free fault localization,” in 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE). Los Alamitos, CA, USA: IEEE Computer Society, Apr 2024, pp. 154–165.
- N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang, “Lost in the Middle: How Language Models Use Long Contexts,” Transactions of the Association for Computational Linguistics, vol. 12, pp. 157–173, 02 2024.
- M. Böhme, E. O. Soremekun, S. Chattopadhyay, E. Ugherughe, and A. Zeller, “Where is the bug and how is it fixed? an experiment with practitioners,” in Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. ACM, 2017, pp. 117–128.
- R. Just, D. Jalali, and M. D. Ernst, “Defects4J: A database of existing faults to enable controlled testing studies for java programs,” in Proceedings of the 23rd International Symposium on Software Testing and Analysis. ACM, 2014, pp. 437–440.
- R. Abreu, P. Zoeteweij, and A. J. Van Gemund, “An evaluation of similarity coefficients for software fault localization,” in 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC’06), 2006, pp. 39–46.
- W. E. Wong, V. Debroy, R. Gao, and Y. Li, “The dstar method for effective software fault localization,” IEEE Transactions on Reliability, vol. 63, no. 1, pp. 290–308, 2013.
- R. Abreu, P. Zoeteweij, and A. J. Van Gemund, “On the accuracy of spectrum-based fault localization,” in Testing: Academic and Industrial Conference Practice and Research Techniques-MUTATION. IEEE, 2007, pp. 89–98.
- M. Tomáŝ, K. Martin, B. Lukáŝ, C. Jan, and K. Sanjeev, “Recurrent neural network based language model,” Interspeech 2010, p. 1045, 09 2010.
- M.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, “Multilayer perceptron and neural networks,” WSEAS Trans. Cir. and Sys., vol. 8, no. 7, p. 579–588, jul 2009.
- Y. Li, R. Zemel, M. Brockschmidt, and D. Tarlow, “Gated graph sequence neural networks,” in Proceedings of ICLR’16, April 2016.
- X. Li and L. Zhang, “Transforming programs and tests in tandem for fault localization,” Proc. ACM Program. Lang., vol. 1, no. OOPSLA, oct 2017.
- D. Zou, J. Liang, Y. Xiong, M. D. Ernst, and L. Zhang, “An empirical study of fault localization families and their combinations,” IEEE Transactions on Software Engineering, vol. 47, no. 2, pp. 332–347, 2021.
- J. Qian, X. Ju, and X. Chen, “Gnet4fl: Effective fault localization via graph convolutional neural network,” Automated Software Engg., vol. 30, no. 2, apr 2023.
- “Openai,” https://openai.com/, 2024.
- “Phind,” https://www.phind.com/, 2024.
- E. Nijkamp, B. Pang, H. Hayashi, L. Tu, H. Wang, Y. Zhou, S. Savarese, and C. Xiong, “Codegen: An open large language model for code with multi-turn program synthesis,” in The Eleventh International Conference on Learning Representations, 2023.
- R. Li, L. B. allal, Y. Zi, N. Muennighoff, D. Kocetkov, C. Mou, M. Marone, C. Akiki, J. LI, J. Chim, Q. Liu, E. Zheltonozhskii, T. Y. Zhuo, T. Wang, O. Dehaene, J. Lamy-Poirier, J. Monteiro, N. Gontier, M.-H. Yee, L. K. Umapathi, J. Zhu, B. Lipkin, M. Oblokulov, Z. Wang, R. Murthy, J. T. Stillerman, S. S. Patel, D. Abulkhanov, M. Zocca, M. Dey, Z. Zhang, U. Bhattacharyya, W. Yu, S. Luccioni, P. Villegas, F. Zhdanov, T. Lee, N. Timor, J. Ding, C. S. Schlesinger, H. Schoelkopf, J. Ebert, T. Dao, M. Mishra, A. Gu, C. J. Anderson, B. Dolan-Gavitt, D. Contractor, S. Reddy, D. Fried, D. Bahdanau, Y. Jernite, C. M. Ferrandis, S. Hughes, T. Wolf, A. Guha, L. V. Werra, and H. de Vries, “Starcoder: may the source be with you!” Transactions on Machine Learning Research, 2023, reproducibility Certification.
- W. Zhou, Y. E. Jiang, P. Cui, T. Wang, Z. Xiao, Y. Hou, R. Cotterell, and M. Sachan, “Recurrentgpt: Interactive generation of (arbitrarily) long text,” arXiv:2305.13304 [cs.CL], 2023.
- S. G. Patil, T. Zhang, X. Wang, and J. E. Gonzalez, “Gorilla: Large language model connected with massive apis,” arXiv:2305.15334 [cs.CL], 2023.
- G. Li, H. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “Camel: Communicative agents for ”mind” exploration of large language model society,” in Advances in Neural Information Processing Systems, A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36. Curran Associates, Inc., 2023, pp. 51 991–52 008.
- J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” in Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, ser. UIST ’23. New York, NY, USA: Association for Computing Machinery, 2023.
- I. Gur, H. Furuta, A. V. Huang, M. Safdari, Y. Matsuo, D. Eck, and A. Faust, “A real-world webagent with planning, long context understanding, and program synthesis,” in The Twelfth International Conference on Learning Representations, 2024.
- “Autogpt,” https://github.com/Significant-Gravitas/AutoGPT, 2024.
- “Babyagi,” https://github.com/yoheinakajima/babyagi, 2024.
- S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, J. Wang, C. Zhang, Z. Wang, S. K. S. Yau, Z. Lin, L. Zhou, C. Ran, L. Xiao, C. Wu, and J. Schmidhuber, “MetaGPT: Meta programming for multi-agent collaborative framework,” in The Twelfth International Conference on Learning Representations, 2024.
- C. Qian, X. Cong, W. Liu, C. Yang, W. Chen, Y. Su, Y. Dang, J. Li, J. Xu, D. Li, Z. Liu, and M. Sun, “Communicative agents for software development,” arXiv:2307.07924 [cs.SE], 2023.
- W. Zhou, Y. E. Jiang, L. Li, J. Wu, T. Wang, S. Qiu, J. Zhang, J. Chen, R. Wu, S. Wang, S. Zhu, J. Chen, W. Zhang, X. Tang, N. Zhang, H. Chen, P. Cui, and M. Sachan, “Agents: An open-source framework for autonomous language agents,” arXiv:2309.07870 [cs.CL], 2023.
- Y. Zhang, Y. Li, L. Cui, D. Cai, L. Liu, T. Fu, X. Huang, E. Zhao, Y. Zhang, Y. Chen et al., “Siren’s song in the ai ocean: a survey on hallucination in large language models,” arXiv:2309.01219 [cs.CL], 2023.
- K. Yang, X. Mao, S. Wang, T. Zhang, B. Lin, Y. Wang, Y. Qin, Z. Zhang, and X. Mao, “Enhancing code intelligence tasks with chatgpt,” arXiv:2312.15202 [cs.SE], 2023.
- “java.lang.instrument,” https://docs.oracle.com/javase/8/docs/api/java/lang/instrument/package-summary.html, 2024.
- “tree-sitter,” https://tree-sitter.github.io/tree-sitter/, 2024.
- Z. Qin, R. Jagerman, K. Hui, H. Zhuang, J. Wu, J. Shen, T. Liu, J. Liu, D. Metzler, X. Wang, and M. Bendersky, “Large language models are effective text rankers with pairwise ranking prompting,” arXiv:2306.17563 [cs.IR], 2023.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, b. ichter, F. Xia, E. Chi, Q. V. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 24 824–24 837.
- P. S. Kochhar, X. Xia, D. Lo, and S. Li, “Practitioners’ expectations on automated fault localization,” in Proceedings of the 25th International Symposium on Software Testing and Analysis, ser. ISSTA 2016. New York, NY, USA: Association for Computing Machinery, 2016, p. 165–176.
- X. Xia, L. Bao, D. Lo, and S. Li, “‘automated debugging considered harmful’ considered harmful: A user study revisiting the usefulness of spectra-based fault localization techniques with professionals using real bugs from large systems,” in 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2016, pp. 267–278.
- Yihao Qin (9 papers)
- Shangwen Wang (29 papers)
- Yiling Lou (28 papers)
- Jinhao Dong (4 papers)
- Kaixin Wang (30 papers)
- Xiaoling Li (4 papers)
- Xiaoguang Mao (27 papers)