2000 character limit reached
When Fuzzing Meets LLMs: Challenges and Opportunities (2404.16297v1)
Published 25 Apr 2024 in cs.SE and cs.AI
Abstract: Fuzzing, a widely-used technique for bug detection, has seen advancements through LLMs. Despite their potential, LLMs face specific challenges in fuzzing. In this paper, we identified five major challenges of LLM-assisted fuzzing. To support our findings, we revisited the most papers from top-tier conferences, confirming that these challenges are widespread. As a remedy, we propose some actionable recommendations to help improve applying LLM in Fuzzing and conduct preliminary evaluations on DBMS fuzzing. The results demonstrate that our recommendations effectively address the identified challenges.
- 2023. aws-mysql-jdbc. https://github.com/awslabs/aws-mysql-jdbc. Accessed: April 25, 2024.
- 2023. mariadb-connector-j. https://github.com/mariadb-corporation/mariadb-connector-j. Accessed: April 25, 2024.
- 2023. mysql-connector-j. https://github.com/mysql/mysql-connector-j. Accessed: April 25, 2024.
- 2023. Rfc854. https://datatracker.ietf.org/doc/html/rfc854. Accessed: April 25, 2024.
- Joshua Ackerman and George Cybenko. 2023. Large Language Models for Fuzzing Parsers (Registered Report). In Proceedings of the 2nd International Fuzzing Workshop. 31–38.
- MonetDB B.V. 2023. MonetDB Website. https://www.monetdb.org. Accessed: April 25, 2024.
- A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109 (2023).
- A systematic review of fuzzing techniques. Computers & Security 75 (2018), 118–137.
- Effective test generation using pre-trained large language models and mutation testing. arXiv preprint arXiv:2308.16557 (2023).
- Victor Dantas. 2023. Large Language Model Powered Test Case Generation for Software Applications. (2023).
- Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis. 423–435.
- DuckDB. 2023. DuckDB WebSite. https://www.duckdb.org/. Accessed: April 25, 2024.
- Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Preprints (2023).
- Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:2308.10620 (2023).
- A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232 (2023).
- Automated Bug Generation in the era of Large Language Models. arXiv preprint arXiv:2310.02407 (2023).
- ClickHouse Inc. 2023. ClickHouse Website. https://clickhouse.com. Accessed: April 25, 2024.
- Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach. arXiv preprint arXiv:2310.06680 (2023).
- Challenges and applications of large language models. arXiv preprint arXiv:2307.10169 (2023).
- Siva Kesava Reddy Kakarla and Ryan Beckett. 2023. Oracle-based Protocol Testing with Eywa. arXiv preprint arXiv:2312.06875 (2023).
- Prateek Kumar and Sanjay Kathuria. 2023. Large language models (LLMs) for natural language processing (NLP) of oil and gas drilling data. In SPE Annual Technical Conference and Exhibition? SPE, D021S012R004.
- Hallucinations in neural machine translation. (2018).
- Large Language Model-Aware In-Context Learning for Code Generation. arXiv preprint arXiv:2310.09748 (2023).
- Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 14–26.
- Fuzzing: State of the art. IEEE Transactions on Reliability 67, 3 (2018), 1199–1218.
- Evaluating large language models for radiology natural language processing. arXiv preprint arXiv:2307.13693 (2023).
- Prompt Fuzzing for Fuzz Driver Generation. arXiv preprint arXiv:2312.17677 (2023).
- Large Language Model guided Protocol Fuzzing. In Proceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS).
- MITRE. 2021. CVE-2021-40523. (2021).
- Object hallucination in image captioning. arXiv preprint arXiv:1809.02156 (2018).
- ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding. arXiv preprint arXiv:2305.14196 (2023).
- {{\{{KSG}}\}}: Augmenting Kernel Fuzzing with System Call Specification Generation. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). 351–366.
- Utilizing Large Language Models for Fuzzing: A Novel Deep Learning Approach to Seed Generation. (2023).
- Google Open Source Security Team. [n. d.]. AI-Powered Fuzzing: Breaking the Bug Hunting Barrier. https://security.googleblog.com/2023/08/ai-powered-fuzzing-breaking-bug-hunting.html. Accessed: April 25, 2024.
- Universal fuzzing via large language models. arXiv preprint arXiv:2308.04748 (2023).
- White-box compiler fuzzing empowered by large language models. arXiv preprint arXiv:2310.15991 (2023).
- KernelGPT: Enhanced Kernel Fuzzing via Large Language Models. arXiv preprint arXiv:2401.00563 (2023).
- Understanding large language model based fuzz driver generation. arXiv preprint arXiv:2307.12469 (2023).