Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Software Testing with Large Language Models: Survey, Landscape, and Vision (2307.07221v3)

Published 14 Jul 2023 in cs.SE
Software Testing with Large Language Models: Survey, Landscape, and Vision

Abstract: Pre-trained LLMs have recently emerged as a breakthrough technology in natural language processing and artificial intelligence, with the ability to handle large-scale datasets and exhibit remarkable performance across a wide range of tasks. Meanwhile, software testing is a crucial undertaking that serves as a cornerstone for ensuring the quality and reliability of software products. As the scope and complexity of software systems continue to grow, the need for more effective software testing techniques becomes increasingly urgent, making it an area ripe for innovative approaches such as the use of LLMs. This paper provides a comprehensive review of the utilization of LLMs in software testing. It analyzes 102 relevant studies that have used LLMs for software testing, from both the software testing and LLMs perspectives. The paper presents a detailed discussion of the software testing tasks for which LLMs are commonly used, among which test case preparation and program repair are the most representative. It also analyzes the commonly used LLMs, the types of prompt engineering that are employed, as well as the accompanied techniques with these LLMs. It also summarizes the key challenges and potential opportunities in this direction. This work can serve as a roadmap for future research in this area, highlighting potential avenues for exploration, and identifying gaps in our current understanding of the use of LLMs in software testing.

Software Testing with LLMs: A Comprehensive Review

Utilization of LLMs in Software Testing

The integration of pre-trained LLMs into software testing denotes a promising approach, particularly as software systems become increasingly complex. This paper rigorously analyzes current methodologies and advancements in applying LLMs for software testing. It focuses on 102 relevant studies, presenting a broad spectrum of software testing tasks, including but not limited to test case preparation, program diagnostics, and bug repair.

Insights from Software Testing Tasks

Test Case Generation

One of the paramount uses of LLMs is observed in the generation of unit test cases. With software testing grappling with challenges such as automated unit test case generation, LLMs offer a notable advantage by leveraging their inherent ability to comprehend and process large codebases. They facilitate automated test creation that significantly improves coverage and test quality. The paper categorizes the application of LLMs into pre-training or fine-tuning with domain-specific datasets alongside prompt engineering techniques to steer LLM behaviors towards generating desirable testing outcomes.

Program Repair

Another vital application of LLMs is in program repair, where they have been used to debug and rectify software defects. Utilizing a combination of pre-training for domain-specific adaptation and carefully engineered prompts, LLMs have shown significant potential in identifying and fixing errors in code. This approach has presented notable efficiency in patching known vulnerabilities, showcasing the LLMs’ ability to tackle complex software debugging tasks.

Test Oracle Generation and Input Generation

The generation of test oracles and systematic test inputs presents challenges such as the oracle problem in software testing. Here, LLMs have shown promise by being utilized in differential testing approaches and generating metamorphic relations to tackle these issues effectively. Furthermore, LLMs have been applied to generate diversified test inputs for various types of software, demonstrating flexibility across different application domains.

Utilization Aspects of LLMs

LLM Models in Use

A remarkable aspect highlighted is the varied use of specific LLMs such as ChatGPT, Codex, and CodeT5, among others, depending on the nature and requirements of the testing task. ChatGPT emerges as the most frequently utilized model, attributing to its architectural design optimized for understanding natural language and code.

Prompt Engineering Techniques

The paper delineates various prompt engineering strategies adopted to enhance LLM performance in software testing tasks. These strategies range from zero-shot and few-shot learning to more sophisticated methods like chain-of-thought prompting. Each technique offers unique benefits in refining the model's output towards more relevant and accurate test artifacts.

Challenges and Future Directions

Despite the substantial advancements and successful application of LLMs in software testing, several challenges remain. These include achieving high coverage in test case generation, addressing the test oracle problem, and the need for rigorous evaluation frameworks to measure LLM performance accurately. Moreover, the potential benefits of exploring LLMs in early-stage testing activities, non-functional testing, and the integration of advanced prompt engineering techniques present avenues for future research.

Real-world Applications and Integration

The paper also casts light on the real-world applicability challenges of employing LLMs in software testing, emphasizing the need for domain-specific fine-tuning and prompt engineering to meet industry-specific requirements. Furthermore, it suggests the exploration of combining traditional testing techniques with LLM capabilities to enhance testing efficacy and coverage.

Conclusion

In conclusion, the paper provides a comprehensive analysis of using LLMs in software testing, summarizing current practices, challenges, and future research opportunities. It underlines the potential of LLMs to revolutionize software testing methodologies but also calls attention to the need for further investigation and development to fully leverage LLM capabilities in practical and diverse testing scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (162)
  1. M. Harman and P. McMinn, “A theoretical and empirical study of search-based testing: Local, global, and hybrid search,” vol. 36, no. 2, 2010, pp. 226–247.
  2. P. Delgado-Pérez, A. Ramírez, K. J. Valle-Gómez, I. Medina-Bulo, and J. R. Romero, “Interevo-tr: Interactive evolutionary test generation with readability assessment,” IEEE Trans. Software Eng., vol. 49, no. 4, pp. 2580–2596, 2023.
  3. X. Xiao, S. Li, T. Xie, and N. Tillmann, “Characteristic studies of loop problems for structural test generation via symbolic execution,” in 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013, Silicon Valley, CA, USA, November 11-15, 2013, E. Denney, T. Bultan, and A. Zeller, Eds.   IEEE, 2013, pp. 246–256.
  4. C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball, “Feedback-directed random test generation,” in 29th International Conference on Software Engineering (ICSE 2007), Minneapolis, MN, USA, May 20-26, 2007.   IEEE Computer Society, 2007, pp. 75–84.
  5. Z. Yuan, Y. Lou, M. Liu, S. Ding, K. Wang, Y. Chen, and X. Peng, “No more manual tests? evaluating and improving chatgpt for unit test generation,” arXiv preprint arXiv:2305.04207, 2023.
  6. Y. Tang, Z. Liu, Z. Zhou, and X. Luo, “Chatgpt vs SBST: A comparative assessment of unit test suite generation,” CoRR, vol. abs/2307.00588, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2307.00588
  7. A. Developers, “Ui/application exerciser monkey,” 2012.
  8. Y. Li, Z. Yang, Y. Guo, and X. Chen, “Droidbot: a lightweight ui-guided test input generator for android,” in ICSE.   IEEE, 2017.
  9. T. Su, G. Meng, Y. Chen, K. Wu, W. Yang, Y. Yao, G. Pu, Y. Liu, and Z. Su, “Guided, stochastic model-based gui testing of android apps,” in Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017, pp. 245–256.
  10. Z. Dong, M. Böhme, L. Cojocaru, and A. Roychoudhury, “Time-travel testing of android apps,” in ICSE.   IEEE, 2020.
  11. M. Pan, A. Huang, G. Wang, T. Zhang, and X. Li, “Reinforcement learning based curiosity-driven testing of android applications,” in Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020, pp. 153–164.
  12. Z. Liu, C. Chen, J. Wang, M. Chen, B. Wu, X. Che, D. Wang, and Q. Wang, “Make LLM a testing expert: Bringing human-like interaction to mobile GUI testing via functionality-aware decisions,” CoRR, vol. abs/2310.15780, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.15780
  13. T. Su, J. Wang, and Z. Su, “Benchmarking automated GUI testing for android against real-world bugs,” in ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021.   ACM, 2021, pp. 119–130.
  14. M. Shanahan, “Talking about large language models,” CoRR, vol. abs/2212.03551, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2212.03551
  15. W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, Y. Du, C. Yang, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, X. Tang, Z. Liu, P. Liu, J. Nie, and J. Wen, “A survey of large language models,” CoRR, vol. abs/2303.18223, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2303.18223
  16. T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” in NeurIPS, 2022. [Online]. Available: http://papers.nips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html
  17. J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in NeurIPS, 2022. [Online]. Available: http://papers.nips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html
  18. J. Li, G. Li, Y. Li, and Z. Jin, “Structured chain-of-thought prompting for code generation,” 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:258615421
  19. J. Li, Y. Li, G. Li, Z. Jin, Y. Hao, and X. Hu, “Skcoder: A sketch-based approach for automatic code generation,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023, pp. 2124–2135.
  20. J. Li, Y. Zhao, Y. Li, G. Li, and Z. Jin, “Acecoder: Utilizing existing code to enhance code generation,” 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257901190
  21. Y. Dong, X. Jiang, Z. Jin, and G. Li, “Self-collaboration code generation via chatgpt,” CoRR, vol. abs/2304.07590, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.07590
  22. S. Pan, L. Luo, Y. Wang, C. Chen, J. Wang, and X. Wu, “Unifying large language models and knowledge graphs: A roadmap,” CoRR, vol. abs/2306.08302, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.08302
  23. M. Tufano, D. Drain, A. Svyatkovskiy, S. K. Deng, and N. Sundaresan, “Unit test case generation with transformers and focal context,” arXiv preprint arXiv:2009.05617, 2020.
  24. B. Chen, F. Zhang, A. Nguyen, D. Zan, Z. Lin, J.-G. Lou, and W. Chen, “Codet: Code generation with generated tests,” arXiv preprint arXiv:2207.10397, 2022.
  25. S. K. Lahiri, A. Naik, G. Sakkas, P. Choudhury, C. von Veh, M. Musuvathi, J. P. Inala, C. Wang, and J. Gao, “Interactive code generation via test-driven user-intent formalization,” arXiv preprint arXiv:2208.05950, 2022.
  26. S. Alagarsamy, C. Tantithamthavorn, and A. Aleti, “A3test: Assertion-augmented automated test case generation,” arXiv preprint arXiv:2302.10352, 2023.
  27. M. Schäfer, S. Nadi, A. Eghbali, and F. Tip, “An empirical evaluation of using large language models for automated unit test generation,” IEEE Transactions on Software Engineering, pp. 1–21, 2023.
  28. V. Guilherme and A. Vincenzi, “An initial investigation of chatgpt unit test generation capability,” in 8th Brazilian Symposium on Systematic and Automated Software Testing, SAST 2023, Campo Grande, MS, Brazil, September 25-29, 2023, A. L. Fontão, D. M. B. Paiva, H. Borges, M. I. Cagnin, P. G. Fernandes, V. Borges, S. M. Melo, V. H. S. Durelli, and E. D. Canedo, Eds.   ACM, 2023, pp. 15–24. [Online]. Available: https://doi.org/10.1145/3624032.3624035
  29. S. Hashtroudi, J. Shin, H. Hemmati, and S. Wang, “Automated test case generation using code models and domain adaptation,” CoRR, vol. abs/2308.08033, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.08033
  30. L. Plein, W. C. Ouédraogo, J. Klein, and T. F. Bissyandé, “Automatic generation of test cases based on bug reports: a feasibility study with large language models,” CoRR, vol. abs/2310.06320, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.06320
  31. V. Vikram, C. Lemieux, and R. Padhye, “Can large language models write good property-based tests?” CoRR, vol. abs/2307.04346, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2307.04346
  32. N. Rao, K. Jain, U. Alon, C. L. Goues, and V. J. Hellendoorn, “CAT-LM training language models on aligned code and tests,” in 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023.   IEEE, 2023, pp. 409–420. [Online]. Available: https://doi.org/10.1109/ASE56229.2023.00193
  33. Z. Xie, Y. Chen, C. Zhi, S. Deng, and J. Yin, “Chatunitest: a chatgpt-based automated unit test generation tool,” arXiv preprint arXiv:2305.04764, 2023.
  34. C. Lemieux, J. P. Inala, S. K. Lahiri, and S. Sen, “Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models,” in International conference on software engineering (ICSE), 2023.
  35. A. M. Dakhel, A. Nikanjam, V. Majdinasab, F. Khomh, and M. C. Desmarais, “Effective test generation using pre-trained large language models and mutation testing,” CoRR, vol. abs/2308.16557, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.16557
  36. M. L. Siddiq, J. Santos, R. H. Tanvir, N. Ulfat, F. A. Rifat, and V. C. Lopes, “Exploring the effectiveness of large language models in generating unit tests,” arXiv preprint arXiv:2305.00418, 2023.
  37. Y. Zhang, W. Song, Z. Ji, D. Yao, and N. Meng, “How well does LLM generate security tests?” CoRR, vol. abs/2310.00710, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.00710
  38. V. Li and N. Doiron, “Prompting code interpreter to write better unit tests on quixbugs functions,” CoRR, vol. abs/2310.00483, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.00483
  39. B. Steenhoek, M. Tufano, N. Sundaresan, and A. Svyatkovskiy, “Reinforcement learning from automatic feedback for high-quality unit test generation,” 2023.
  40. S. Bhatia, T. Gandhi, D. Kumar, and P. Jalote, “Unit test generation using generative ai : A comparative performance analysis of autogeneration tools,” 2023.
  41. M. Tufano, D. Drain, A. Svyatkovskiy, and N. Sundaresan, “Generating accurate assert statements for unit test cases using pretrained transformers,” in Proceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test, 2022, pp. 54–64.
  42. P. Nie, R. Banerjee, J. J. Li, R. J. Mooney, and M. Gligoric, “Learning deep semantics for test completion,” arXiv preprint arXiv:2302.10166, 2023.
  43. A. Mastropaolo, N. Cooper, D. Nader-Palacio, S. Scalabrino, D. Poshyvanyk, R. Oliveto, and G. Bavota, “Using transfer learning for code-related tasks,” IEEE Trans. Software Eng., vol. 49, no. 4, pp. 1580–1598, 2023. [Online]. Available: https://doi.org/10.1109/TSE.2022.3183297
  44. N. Nashid, M. Sintaha, and A. Mesbah, “Retrieval-based prompt selection for code-related few-shot learning,” in Proceedings of the 45th International Conference on Software Engineering (ICSE’23), 2023.
  45. G. Ye, Z. Tang, S. H. Tan, S. Huang, D. Fang, X. Sun, L. Bian, H. Wang, and Z. Wang, “Automated conformance testing for javascript engines via deep compiler fuzzing,” in Proceedings of the 42nd ACM SIGPLAN international conference on programming language design and implementation, 2021, pp. 435–450.
  46. Z. Liu, C. Chen, J. Wang, X. Che, Y. Huang, J. Hu, and Q. Wang, “Fill in the blank: Context-aware automated text input generation for mobile gui testing,” arXiv preprint arXiv:2212.04732, 2022.
  47. M. R. Taesiri, F. Macklon, Y. Wang, H. Shen, and C.-P. Bezemer, “Large language models are pretty good zero-shot video game bug detectors,” arXiv preprint arXiv:2210.02506, 2022.
  48. S. L. Shrestha and C. Csallner, “Slgpt: using transfer learning to directly generate simulink model files and find bugs in the simulink toolchain,” in Evaluation and Assessment in Software Engineering, 2021, pp. 260–265.
  49. J. Hu, Q. Zhang, and H. Yin, “Augmenting greybox fuzzing with generative AI,” CoRR, vol. abs/2306.06782, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.06782
  50. A. Mathur, S. Pradhan, P. Soni, D. Patel, and R. Regunathan, “Automated test case generation using t5 and gpt-3,” in 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1, 2023, pp. 1986–1992.
  51. D. Zimmermann and A. Koziolek, “Automating gui-based software testing with gpt-3,” in 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), 2023, pp. 62–65.
  52. M. Taeb, A. Swearngin, E. Schoop, R. Cheng, Y. Jiang, and J. Nichols, “Axnav: Replaying accessibility tests from natural language,” CoRR, vol. abs/2310.02424, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.02424
  53. Q. Luu, H. Liu, and T. Y. Chen, “Can chatgpt advance software testing intelligence? an experience report on metamorphic testing,” CoRR, vol. abs/2310.19204, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.19204
  54. A. Khanfir, R. Degiovanni, M. Papadakis, and Y. L. Traon, “Efficient mutation testing via pre-trained language models,” arXiv preprint arXiv:2301.03543, 2023.
  55. Y. Deng, C. S. Xia, C. Yang, S. D. Zhang, S. Yang, and L. Zhang, “Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt,” arXiv preprint arXiv:2304.02014, 2023.
  56. ——, “Large language models are zero shot fuzzers: Fuzzing deep learning libraries via large language models,” arXiv preprint arXiv:2209.11515, 2023.
  57. J. Ackerman and G. Cybenko, “Large language models for fuzzing parsers (registered report),” in Proceedings of the 2nd International Fuzzing Workshop, FUZZING 2023, Seattle, WA, USA, 17 July 2023, M. Böhme, Y. Noller, B. Ray, and L. Szekeres, Eds.   ACM, 2023, pp. 31–38. [Online]. Available: https://doi.org/10.1145/3605157.3605173
  58. S. Yu, C. Fang, Y. Ling, C. Wu, and Z. Chen, “LLM for test script generation and migration: Challenges, capabilities, and opportunities,” CoRR, vol. abs/2309.13574, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2309.13574
  59. G. Deng, Y. Liu, V. M. Vilches, P. Liu, Y. Li, Y. Xu, T. Zhang, Y. Liu, M. Pinzger, and S. Rass, “Pentestgpt: An llm-empowered automatic penetration testing tool,” CoRR, vol. abs/2308.06782, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.06782
  60. M. Sun, Y. Yang, Y. Wang, M. Wen, H. Jia, and Y. Zhou, “SMT solver validation empowered by large pre-trained language models,” in 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023.   IEEE, 2023, pp. 1288–1300. [Online]. Available: https://doi.org/10.1109/ASE56229.2023.00180
  61. Y. Deng, J. Yao, Z. Tu, X. Zheng, M. Zhang, and T. Zhang, “Target: Automated scenario generation from traffic rules for testing autonomous vehicles,” 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:258588387
  62. Z. Liu, C. Chen, J. Wang, M. Chen, B. Wu, X. Che, D. Wang, and Q. Wang, “Testing the limits: Unusual text inputs generation for mobile app crash detection with large language model,” CoRR, vol. abs/2310.15657, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.15657
  63. C. Zhang, M. Bai, Y. Zheng, Y. Li, X. Xie, Y. Li, W. Ma, L. Sun, and Y. Liu, “Understanding large language model based fuzz driver generation,” CoRR, vol. abs/2307.12469, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2307.12469
  64. C. Xia, M. Paltenghi, J. Tian, M. Pradel, and L. Zhang, “Universal fuzzing via large language models,” ArXiv, vol. abs/2308.04748, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:260735598
  65. C. Tsigkanos, P. Rani, S. Müller, and T. Kehrer, “Variable discovery with large language models for metamorphic testing of scientific software,” in Computational Science - ICCS 2023 - 23rd International Conference, Prague, Czech Republic, July 3-5, 2023, Proceedings, Part I, ser. Lecture Notes in Computer Science, J. Mikyska, C. de Mulatier, M. Paszynski, V. V. Krzhizhanovskaya, J. J. Dongarra, and P. M. A. Sloot, Eds., vol. 14073.   Springer, 2023, pp. 321–335. [Online]. Available: https://doi.org/10.1007/978-3-031-35995-8_23
  66. C. Yang, Y. Deng, R. Lu, J. Yao, J. Liu, R. Jabbarvand, and L. Zhang, “White-box compiler fuzzing empowered by large language models,” CoRR, vol. abs/2310.15991, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.15991
  67. T. Zhang, I. C. Irsan, F. Thung, D. Han, D. Lo, and L. Jiang, “itiger: an automatic issue title generation tool,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 1637–1641.
  68. Y. Huang, J. Wang, Z. Liu, Y. Wang, S. Wang, C. Chen, Y. Hu, and Q. Wang, “Crashtranslator: Automatically reproducing mobile application crashes directly from stack trace,” CoRR, vol. abs/2310.07128, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.07128
  69. T. Zhang, I. C. Irsan, F. Thung, and D. Lo, “Cupid: Leveraging chatgpt for more accurate duplicate bug report detection,” CoRR, vol. abs/2308.10022, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.10022
  70. U. Mukherjee and M. M. Rahman, “Employing deep learning and structured information retrieval to answer clarification questions on bug reports,” 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:259501524
  71. P. Mahbub, O. Shuvo, and M. M. Rahman, “Explaining software bugs leveraging code structures in neural machine translation,” arXiv preprint arXiv:2212.04584, 2022.
  72. S. Feng and C. Chen, “Prompting is all your need: Automated android bug replay with large language models,” CoRR, vol. abs/2306.01987, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.01987
  73. Y. Su, Z. Han, Z. Gao, Z. Xing, Q. Lu, and X. Xu, “Still confusing for bug-component triaging? deep feature learning and ensemble setting to rescue,” in 31st IEEE/ACM International Conference on Program Comprehension, ICPC 2023, Melbourne, Australia, May 15-16, 2023.   IEEE, 2023, pp. 316–327. [Online]. Available: https://doi.org/10.1109/ICPC58990.2023.00046
  74. N. D. Bui, Y. Wang, and S. Hoi, “Detect-localize-repair: A unified framework for learning to debug with codet5,” arXiv preprint arXiv:2211.14875, 2022.
  75. S. Kang, J. Yoon, and S. Yoo, “Large language models are few-shot testers: Exploring llm-based general bug reproduction,” arXiv preprint arXiv:2209.11515, 2022.
  76. S. Kang, G. An, and S. Yoo, “A preliminary evaluation of llm-based fault localization,” CoRR, vol. abs/2308.05487, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.05487
  77. P. Widjojo and C. Treude, “Addressing compiler errors: Stack overflow or large language models?” CoRR, vol. abs/2307.10793, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2307.10793
  78. L. Plein and T. F. Bissyandé, “Can llms demystify bug reports?” CoRR, vol. abs/2310.06310, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.06310
  79. A. Taylor, A. Vassar, J. Renzella, and H. A. Pearce, “Dcc –help: Generating context-aware compiler error explanations with large language models,” 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:261076439
  80. S. Kang, B. Chen, S. Yoo, and J.-G. Lou, “Explainable automated debugging via large language model-driven scientific debugging,” arXiv preprint arXiv:2304.02195, 2023.
  81. A. Z. H. Yang, R. Martins, C. L. Goues, and V. J. Hellendoorn, “Large language models for test-free fault localization,” CoRR, vol. abs/2310.01726, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.01726
  82. Y. Wu, Z. Li, J. M. Zhang, M. Papadakis, M. Harman, and Y. Liu, “Large language models in fault localisation,” CoRR, vol. abs/2308.15276, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.15276
  83. H. Tu, Z. Zhou, H. Jiang, I. N. B. Yusuf, Y. Li, and L. Jiang, “LLM4CBI: taming llms to generate effective test programs for compiler bug isolation,” CoRR, vol. abs/2307.00593, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2307.00593
  84. T.-O. Li, W. Zong, Y. Wang, H. Tian, Y. Wang, S.-C. Cheung, and J. Kramer, “Nuances are the key: Unlocking chatgpt to find failure-inducing tests with differential prompting,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2023, pp. 14–26.
  85. X. Chen, M. Lin, N. Schärli, and D. Zhou, “Teaching large language models to self-debug,” CoRR, vol. abs/2304.05128, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.05128
  86. J. Cao, M. Li, M. Wen, and S.-c. Cheung, “A study on prompt design, advantages and limitations of chatgpt for deep learning program repair,” arXiv preprint arXiv:2304.08191, 2023.
  87. H. Pearce, B. Tan, B. Ahmad, R. Karri, and B. Dolan-Gavitt, “Examining zero-shot vulnerability repair with large language models,” in 2023 IEEE Symposium on Security and Privacy (SP).   IEEE Computer Society, 2022, pp. 1–18.
  88. Z. Fan, X. Gao, A. Roychoudhury, and S. H. Tan, “Automated repair of programs from large language models,” arXiv preprint arXiv:2205.10583, 2022.
  89. Y. Hu, X. Shi, Q. Zhou, and L. Pike, “Fix bugs with transformer through a neural-symbolic edit grammar,” arXiv preprint arXiv:2204.06643, 2022.
  90. C. S. Xia, Y. Wei, and L. Zhang, “Practical program repair in the era of large pre-trained language models,” arXiv preprint arXiv:2210.14179, 2022.
  91. J. Zhang, J. Cambronero, S. Gulwani, V. Le, R. Piskac, G. Soares, and G. Verbruggen, “Repairing bugs in python assignments using large language models,” arXiv preprint arXiv:2209.14876, 2022.
  92. M. Lajkó, V. Csuvik, and L. Vidács, “Towards javascript program repair with generative pre-trained transformer (gpt-2),” in Proceedings of the Third International Workshop on Automated Program Repair, 2022, pp. 61–68.
  93. D. Sobania, M. Briesch, C. Hanna, and J. Petke, “An analysis of the automatic bug fixing performance of chatgpt,” arXiv preprint arXiv:2301.08653, 2023.
  94. K. Huang, X. Meng, J. Zhang, Y. Liu, W. Wang, S. Li, and Y. Zhang, “An empirical study on fine-tuning large language models of code for automated program repair,” in 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023.   IEEE, 2023, pp. 1162–1174. [Online]. Available: https://doi.org/10.1109/ASE56229.2023.00181
  95. M. C. Wuisang, M. Kurniawan, K. A. Wira Santosa, A. Agung Santoso Gunawan, and K. E. Saputra, “An evaluation of the effectiveness of openai’s chatgpt for automated python program bug fixing using quixbugs,” in 2023 International Seminar on Application for Technology of Information and Communication (iSemantic), 2023, pp. 295–300.
  96. D. Horváth, V. Csuvik, T. Gyimóthy, and L. Vidács, “An extensive study on model architecture and program representation in the domain of learning-based automated program repair,” in IEEE/ACM International Workshop on Automated Program Repair, APR@ICSE 2023, Melbourne, Australia, May 16, 2023.   IEEE, 2023, pp. 31–38. [Online]. Available: https://doi.org/10.1109/APR59189.2023.00013
  97. J. A. Prenner, H. Babii, and R. Robbes, “Can openai’s codex fix bugs? an evaluation on quixbugs,” in Proceedings of the Third International Workshop on Automated Program Repair, 2022, pp. 69–75.
  98. W. Yuan, Q. Zhang, T. He, C. Fang, N. Q. V. Hung, X. Hao, and H. Yin, “Circle: continual repair across programming languages,” in Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, 2022, pp. 678–690.
  99. S. Moon, Y. Song, H. Chae, D. Kang, T. Kwon, K. T. iunn Ong, S. won Hwang, and J. Yeo, “Coffee: Boost your code llms by fixing bugs with feedback,” 2023.
  100. Y. Wei, C. S. Xia, and L. Zhang, “Copiloting the copilots: Fusing large language models with completion engines for automated program repair,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, S. Chandra, K. Blincoe, and P. Tonella, Eds.   ACM, 2023, pp. 172–184. [Online]. Available: https://doi.org/10.1145/3611643.3616271
  101. Y. Peng, S. Gao, C. Gao, Y. Huo, and M. R. Lyu, “Domain knowledge matters: Improving prompts with fix templates for repairing python type errors,” CoRR, vol. abs/2306.01394, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.01394
  102. A. E. I. Brownlee, J. Callan, K. Even-Mendoza, A. Geiger, C. Hanna, J. Petke, F. Sarro, and D. Sobania, “Enhancing genetic improvement mutations using large language models,” in Search-Based Software Engineering - 15th International Symposium, SSBSE 2023, San Francisco, CA, USA, December 8, 2023, Proceedings, ser. Lecture Notes in Computer Science, P. Arcaini, T. Yue, and E. M. Fredericks, Eds., vol. 14415.   Springer, 2023, pp. 153–159. [Online]. Available: https://doi.org/10.1007/978-3-031-48796-5_13
  103. M. M. A. Haque, W. U. Ahmad, I. Lourentzou, and C. Brown, “Fixeval: Execution-based evaluation of program fixes for programming problems,” in IEEE/ACM International Workshop on Automated Program Repair, APR@ICSE 2023, Melbourne, Australia, May 16, 2023.   IEEE, 2023, pp. 11–18. [Online]. Available: https://doi.org/10.1109/APR59189.2023.00009
  104. B. Ahmad, S. Thakur, B. Tan, R. Karri, and H. Pearce, “Fixing hardware security bugs with large language models,” arXiv preprint arXiv:2302.01215, 2023.
  105. P. Deligiannis, A. Lal, N. Mehrotra, and A. Rastogi, “Fixing rust compilation errors using llms,” CoRR, vol. abs/2308.05177, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.05177
  106. F. Ribeiro, R. Abreu, and J. Saraiva, “Framing program repair as code completion,” in Proceedings of the Third International Workshop on Automated Program Repair, 2022, pp. 38–45.
  107. N. Wadhwa, J. Pradhan, A. Sonwane, S. P. Sahu, N. Natarajan, A. Kanade, S. Parthasarathy, and S. K. Rajamani, “Frustrated with code quality issues? llms can help!” CoRR, vol. abs/2309.12938, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2309.12938
  108. F. Ribeiro, J. N. C. de Macedo, K. Tsushima, R. Abreu, and J. Saraiva, “Gpt-3-powered type error debugging: Investigating the use of large language models for code repair,” in Proceedings of the 16th ACM SIGPLAN International Conference on Software Language Engineering, SLE 2023, Cascais, Portugal, October 23-24, 2023, J. Saraiva, T. Degueule, and E. Scott, Eds.   ACM, 2023, pp. 111–124. [Online]. Available: https://doi.org/10.1145/3623476.3623522
  109. Y. Wu, N. Jiang, H. V. Pham, T. Lutellier, J. Davis, L. Tan, P. Babkin, and S. Shah, “How effective are neural networks for fixing security vulnerabilities,” arXiv preprint arXiv:2305.18607, 2023.
  110. N. Jiang, K. Liu, T. Lutellier, and L. Tan, “Impact of code language models on automated program repair,” arXiv preprint arXiv:2302.05020, 2023.
  111. M. Jin, S. Shahriar, M. Tufano, X. Shi, S. Lu, N. Sundaresan, and A. Svyatkovskiy, “Inferfix: End-to-end program repair with llms,” arXiv preprint arXiv:2303.07263, 2023.
  112. C. S. Xia and L. Zhang, “Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using chatgpt,” arXiv preprint arXiv:2304.00385, 2023.
  113. Y. Zhang, G. Li, Z. Jin, and Y. Xing, “Neural program repair with program dependence analysis and effective filter mechanism,” arXiv preprint arXiv:2305.09315, 2023.
  114. J. A. Prenner and R. Robbes, “Out of context: How important is local context in neural program repair?” 2023.
  115. Q. Zhang, C. Fang, B. Yu, W. Sun, T. Zhang, and Z. Chen, “Pre-trained model-based automated software vulnerability repair: How far are we?” CoRR, vol. abs/2308.12533, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.12533
  116. S. Garg, R. Z. Moghaddam, and N. Sundaresan, “Rapgen: An approach for fixing code inefficiencies in zero-shot,” CoRR, vol. abs/2306.17077, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2306.17077
  117. W. Wang, Y. Wang, S. Joty, and S. C. H. Hoi, “Rap-gen: Retrieval-augmented patch generation with codet5 for automatic program repair,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023, S. Chandra, K. Blincoe, and P. Tonella, Eds.   ACM, 2023, pp. 146–158. [Online]. Available: https://doi.org/10.1145/3611643.3616256
  118. Y. Zhang, Z. Jin, Y. Xing, and G. Li, “STEAM: simulating the interactive behavior of programmers for automatic bug fixing,” CoRR, vol. abs/2308.14460, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.14460
  119. S. Fakhoury, S. Chakraborty, M. Musuvathi, and S. K. Lahiri, “Towards generating functionally correct code edits from natural language issue descriptions,” arXiv preprint arXiv:2304.03816, 2023.
  120. M. Fu, C. Tantithamthavorn, T. Le, V. Nguyen, and D. Phung, “Vulrepair: a t5-based automated software vulnerability repair,” in Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 935–947.
  121. S. Gao, X. Wen, C. Gao, W. Wang, H. Zhang, and M. R. Lyu, “What makes good in-context demonstrations for code intelligence tasks with llms?” in 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023.   IEEE, 2023, pp. 761–773. [Online]. Available: https://doi.org/10.1109/ASE56229.2023.00109
  122. C. Treude and H. Hata, “She elicits requirements and he tests: Software engineering gender bias in large language models,” CoRR, vol. abs/2303.10131, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2303.10131
  123. R. Kocielnik, S. Prabhumoye, V. Zhang, R. M. Alvarez, and A. Anandkumar, “Autobiastest: Controllable sentence generation for automated and open-ended social bias testing in language models,” CoRR, vol. abs/2302.07371, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2302.07371
  124. M. Ciniselli, L. Pascarella, and G. Bavota, “To what extent do deep learning-based code recommenders generate predictions by cloning code from the training set?” in 19th IEEE/ACM International Conference on Mining Software Repositories, MSR 2022, Pittsburgh, PA, USA, May 23-24, 2022.   ACM, 2022, pp. 167–178. [Online]. Available: https://doi.org/10.1145/3524842.3528440
  125. D. Erhabor, S. Udayashankar, M. Nagappan, and S. Al-Kiswany, “Measuring the runtime performance of code produced with github copilot,” CoRR, vol. abs/2305.06439, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2305.06439
  126. R. Wang, R. Cheng, D. Ford, and T. Zimmermann, “Investigating and designing for trust in ai-powered code generation tools,” CoRR, vol. abs/2305.11248, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2305.11248
  127. B. Yetistiren, I. Özsoy, M. Ayerdem, and E. Tüzün, “Evaluating the code quality of ai-assisted code generation tools: An empirical study on github copilot, amazon codewhisperer, and chatgpt,” CoRR, vol. abs/2304.10778, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2304.10778
  128. C. Wohlin, “Guidelines for snowballing in systematic literature studies and a replication in software engineering,” in 18th International Conference on Evaluation and Assessment in Software Engineering, EASE ’14, London, England, United Kingdom, May 13-14, 2014, M. J. Shepperd, T. Hall, and I. Myrtveit, Eds.   ACM, 2014, pp. 38:1–38:10. [Online]. Available: https://doi.org/10.1145/2601248.2601268
  129. A. Mastropaolo, S. Scalabrino, N. Cooper, D. Nader-Palacio, D. Poshyvanyk, R. Oliveto, and G. Bavota, “Studying the usage of text-to-text transfer transformer to support code-related tasks,” in 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021.   IEEE, 2021, pp. 336–347.
  130. C. Tsigkanos, P. Rani, S. Müller, and T. Kehrer, “Large language models: The next frontier for variable discovery within metamorphic testing?” in IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023, Taipa, Macao, March 21-24, 2023, T. Zhang, X. Xia, and N. Novielli, Eds.   IEEE, 2023, pp. 678–682. [Online]. Available: https://doi.org/10.1109/SANER56733.2023.00070
  131. S. Lukasczyk and G. Fraser, “Pynguin: Automated unit test generation for python,” in 44th IEEE/ACM International Conference on Software Engineering: Companion Proceedings, ICSE Companion 2022, Pittsburgh, PA, USA, May 22-24, 2022.   ACM/IEEE, 2022, pp. 168–172. [Online]. Available: https://doi.org/10.1145/3510454.3516829
  132. E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo, “The oracle problem in software testing: A survey,” IEEE transactions on software engineering, vol. 41, no. 5, pp. 507–525, 2014.
  133. C. Watson, M. Tufano, K. Moran, G. Bavota, and D. Poshyvanyk, “On learning meaningful assert statements for unit test cases,” in ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, G. Rothermel and D. Bae, Eds.   ACM, 2020, pp. 1398–1409.
  134. Y. He, L. Zhang, Z. Yang, Y. Cao, K. Lian, S. Li, W. Yang, Z. Zhang, M. Yang, Y. Zhang, and H. Duan, “Textexerciser: Feedback-driven text input exercising for android applications,” in 2020 IEEE Symposium on Security and Privacy, SP 2020, San Francisco, CA, USA, May 18-21, 2020.   IEEE, 2020, pp. 1071–1087.
  135. A. Wei, Y. Deng, C. Yang, and L. Zhang, “Free lunch for testing: Fuzzing deep-learning libraries from open source,” in 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022.   ACM, 2022, pp. 995–1007.
  136. D. Xie, Y. Li, M. Kim, H. V. Pham, L. Tan, X. Zhang, and M. W. Godfrey, “Docter: documentation-guided fuzzing for testing deep learning API functions,” in ISSTA ’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18 - 22, 2022, S. Ryu and Y. Smaragdakis, Eds.   ACM, 2022, pp. 176–188.
  137. Q. Guo, X. Xie, Y. Li, X. Zhang, Y. Liu, X. Li, and C. Shen, “Audee: Automated testing for deep learning frameworks,” in 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020.   IEEE, 2020, pp. 486–498.
  138. Z. Wang, M. Yan, J. Chen, S. Liu, and D. Zhang, “Deep learning library testing via effective model generation,” in ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, P. Devanbu, M. B. Cohen, and T. Zimmermann, Eds.   ACM, 2020, pp. 788–799.
  139. J. Jiang, Y. Xiong, H. Zhang, Q. Gao, and X. Chen, “Shaping program repair space with existing patches and similar code,” in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2018.   New York, NY, USA: Association for Computing Machinery, 2018, p. 298–309. [Online]. Available: https://doi.org/10.1145/3213846.3213871
  140. M. Wen, J. Chen, R. Wu, D. Hao, and S.-C. Cheung, “Context-aware patch generation for better automated program repair,” in Proceedings of the 40th International Conference on Software Engineering, ser. ICSE ’18.   New York, NY, USA: Association for Computing Machinery, 2018, p. 1–11. [Online]. Available: https://doi.org/10.1145/3180155.3180233
  141. Y. Xiong, J. Wang, R. Yan, J. Zhang, S. Han, G. Huang, and L. Zhang, “Precise condition synthesis for program repair,” in 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017, pp. 416–426.
  142. J. Xuan, M. Martinez, F. DeMarco, M. Clément, S. L. Marcote, T. Durieux, D. Le Berre, and M. Monperrus, “Nopol: Automatic repair of conditional statement bugs in java programs,” IEEE Transactions on Software Engineering, vol. 43, no. 1, pp. 34–55, 2017.
  143. S. Song, X. Li, and S. Li, “How to bridge the gap between modalities: A comprehensive survey on multimodal large language model,” CoRR, vol. abs/2311.07594, 2023.
  144. J. M. Zhang, M. Harman, L. Ma, and Y. Liu, “Machine learning testing: Survey, landscapes and horizons,” IEEE Trans. Software Eng., vol. 48, no. 2, pp. 1–36, 2022.
  145. F. Tu, J. Zhu, Q. Zheng, and M. Zhou, “Be careful of when: an empirical study on time-related misuse of issue tracking data,” in Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04-09, 2018, G. T. Leavens, A. Garcia, and C. S. Pasareanu, Eds.   ACM, 2018, pp. 307–318. [Online]. Available: https://doi.org/10.1145/3236024.3236054
  146. Z. Sun, L. Li, Y. Liu, X. Du, and L. Li, “On the importance of building high-quality training datasets for neural code search,” in 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022.   ACM, 2022, pp. 1609–1620. [Online]. Available: https://doi.org/10.1145/3510003.3510160
  147. L. Shi, Z. Jiang, Y. Yang, X. Chen, Y. Zhang, F. Mu, H. Jiang, and Q. Wang, “ISPY: automatic issue-solution pair extraction from community live chats,” in 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021.   IEEE, 2021, pp. 142–154. [Online]. Available: https://doi.org/10.1109/ASE51524.2021.9678894
  148. D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, L. Zhou, N. Duan, A. Svyatkovskiy, S. Fu, M. Tufano, S. K. Deng, C. B. Clement, D. Drain, N. Sundaresan, J. Yin, D. Jiang, and M. Zhou, “Graphcodebert: Pre-training code representations with data flow,” in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021.   OpenReview.net, 2021. [Online]. Available: https://openreview.net/forum?id=jLoC4ez43PZ
  149. F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao, “Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop,” arXiv preprint arXiv:1506.03365, 2015.
  150. LoadRunner, Inc., “Loadrunner,” 2023, microfocus.com.
  151. LangChain, Inc., “Langchain,” 2023, https://docs.langchain.com/docs/.
  152. Prompt engineering, “Prompt engineering guide,” 2023, https://github.com/dair-ai/Prompt-Engineering-Guide.
  153. Z. Zhang, A. Zhang, M. Li, H. Zhao, G. Karypis, and A. Smola, “Multimodal chain-of-thought reasoning in language models,” CoRR, vol. abs/2302.00923, 2023.
  154. Z. Liu, X. Yu, Y. Fang, and X. Zhang, “Graphprompt: Unifying pre-training and downstream tasks for graph neural networks,” in Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, Y. Ding, J. Tang, J. F. Sequeda, L. Aroyo, C. Castillo, and G. Houben, Eds.   ACM, 2023, pp. 417–428.
  155. Y. Charalambous, N. Tihanyi, R. Jain, Y. Sun, M. A. Ferrag, and L. C. Cordeiro, “A new era in software security: Towards self-healing software via large language models and formal verification,” 2023.
  156. S. Wang, L. Huang, A. Gao, J. Ge, T. Zhang, H. Feng, I. Satyarth, M. Li, H. Zhang, and V. Ng, “Machine/deep learning for software engineering: A systematic literature review,” IEEE Trans. Software Eng., vol. 49, no. 3, pp. 1188–1231, 2023. [Online]. Available: https://doi.org/10.1109/TSE.2022.3173346
  157. Y. Yang, X. Xia, D. Lo, and J. C. Grundy, “A survey on deep learning for software engineering,” ACM Comput. Surv., vol. 54, no. 10s, pp. 206:1–206:73, 2022. [Online]. Available: https://doi.org/10.1145/3505243
  158. C. Watson, N. Cooper, D. Nader-Palacio, K. Moran, and D. Poshyvanyk, “A systematic literature review on the use of deep learning in software engineering research,” ACM Trans. Softw. Eng. Methodol., vol. 31, no. 2, pp. 32:1–32:58, 2022. [Online]. Available: https://doi.org/10.1145/3485275
  159. M. Bajammal, A. Stocco, D. Mazinanian, and A. Mesbah, “A survey on the use of computer vision to improve software engineering tasks,” IEEE Trans. Software Eng., vol. 48, no. 5, pp. 1722–1742, 2022. [Online]. Available: https://doi.org/10.1109/TSE.2020.3032986
  160. X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. C. Grundy, and H. Wang, “Large language models for software engineering: A systematic literature review,” CoRR, vol. abs/2308.10620, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2308.10620
  161. A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Sengupta, S. Yoo, and J. M. Zhang, “Large language models for software engineering: Survey and open problems,” CoRR, vol. abs/2310.03533, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.03533
  162. D. Zan, B. Chen, F. Zhang, D. Lu, B. Wu, B. Guan, Y. Wang, and J. Lou, “Large language models meet nl2code: A survey,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A. Rogers, J. L. Boyd-Graber, and N. Okazaki, Eds.   Association for Computational Linguistics, 2023, pp. 7443–7464. [Online]. Available: https://doi.org/10.18653/v1/2023.acl-long.411
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Junjie Wang (164 papers)
  2. Yuchao Huang (4 papers)
  3. Chunyang Chen (86 papers)
  4. Zhe Liu (234 papers)
  5. Song Wang (313 papers)
  6. Qing Wang (341 papers)
Citations (161)
Youtube Logo Streamline Icon: https://streamlinehq.com