Prompt Fuzzing for Fuzz Driver Generation (2312.17677v2)
Abstract: Crafting high-quality fuzz drivers not only is time-consuming but also requires a deep understanding of the library. However, the state-of-the-art automatic fuzz driver generation techniques fall short of expectations. While fuzz drivers derived from consumer code can reach deep states, they have limited coverage. Conversely, interpretative fuzzing can explore most API calls but requires numerous attempts within a large search space. We propose PromptFuzz, a coverage-guided fuzzer for prompt fuzzing that iteratively generates fuzz drivers to explore undiscovered library code. To explore API usage in fuzz drivers during prompt fuzzing, we propose several key techniques: instructive program generation, erroneous program validation, coverage-guided prompt mutation, and constrained fuzzer scheduling. We implemented PromptFuzz and evaluated it on 14 real-world libraries. Compared with OSS-Fuzz and Hopper (the state-of-the-art fuzz driver generation tool), fuzz drivers generated by PromptFuzz achieved 1.61 and 1.63 times higher branch coverage than those by OSS-Fuzz and Hopper, respectively. Moreover, the fuzz drivers generated by PromptFuzz detected 33 genuine, new bugs out of a total of 49 crashes, out of which 30 bugs have been confirmed by their respective communities.
- K. Serebryany, “OSS-Fuzz-google’s continuous fuzzing service for open source software,” in Proceedings of the 26th USENIX Conference on Security Symposium (technical sessions). USENIX Association, 2017.
- Taking the next step: Oss-fuzz in 2023. [Online]. Available: https://security.googleblog.com/2023/02/taking-next-step-oss-fuzz-in-2023.html
- American fuzzy lop. [Online]. Available: http://lcamtuf.coredump.cx/afl/
- M. Böhme, V.-T. Pham, and A. Roychoudhury, “Coverage-based greybox fuzzing as markov chain,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016, p. 1032–1043.
- P. Chen and H. Chen, “Angora: efficient fuzzing by principled search,” in IEEE Symposium on Security and Privacy (S&P), San Francisco, CA, 5 2018.
- D. Wang, Y. Li, Z. Zhang, and K. Chen, “Carpetfuzz: Automatic program option constraint extraction from documentation for fuzzing,” in Proceedings of the 32nd USENIX Conference on Security Symposium. Anaheim, CA, USA: USENIX Association, 2023.
- P. Chen, Y. Xie, Y. Lyu, Y. Wang, and H. Chen, “Hopper: Interpretative fuzzing for libraries,” in ACM Conference on Computer and Communications Security (CCS), Copenhagen, Denmark, 2023.
- D. Babić, S. Bucur, Y. Chen, F. Ivančić, T. King, M. Kusano, C. Lemieux, L. Szekeres, and W. Wang, “Fudge: fuzz driver generation at scale,” in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019, pp. 975–985.
- K. Ispoglou, D. Austin, V. Mohan, and M. Payer, “FuzzGen: Automatic fuzzer generation,” in 29th USENIX Security Symposium (USENIX Security 20), 2020, pp. 2271–2287.
- M. Zhang, J. Liu, F. Ma, H. Zhang, and Y. Jiang, “Intelligen: Automatic driver synthesis for fuzz testing,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2021, pp. 318–327.
- C. Zhang, X. Lin, Y. Li, Y. Xue, J. Xie, H. Chen, X. Ying, J. Wang, and Y. Liu, “APICraft: Fuzz driver generation for closed-source SDK libraries,” in 30th USENIX Security Symposium (USENIX Security 21), 2021, pp. 2811–2828.
- J. Jung, S. Tong, H. Hu, J. Lim, Y. Jin, and T. Kim, “Winnie: Fuzzing windows applications with harness synthesis and fast cloning,” in Proceedings of the 2021 Network and Distributed System Security Symposium (NDSS 2021), 2021.
- B. Jeong, J. Jang, H. Yi, J. Moon, J. Kim, I. Jeon, T. Kim, W. Shim, and Y. H. Hwang, “Utopia: Automatic generation of fuzz driver using unit tests,” in 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 2022, pp. 746–762.
- Y. Liu, Y. Wang, T. Bao, X. Jia, Z. Zhang, and P. Su, “Afgen: Whole-function fuzzing for applications and libraries,” in 2024 IEEE Symposium on Security and Privacy (SP), 2024, pp. 11–11.
- Y. Deng, C. S. Xia, H. Peng, C. Yang, and L. Zhang, “Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models,” in Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023, pp. 423–435.
- Fuzz target generation using llms. [Online]. Available: https://google.github.io/oss-fuzz/research/llms/target_generation/
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, 2020, pp. 1877–1901.
- L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” in Advances in Neural Information Processing Systems, 2022, pp. 27 730–27 744.
- OpenAI, “Gpt-4 technical report,” 2023.
- C. Zhang, M. Bai, Y. Zheng, Y. Li, X. Xie, Y. Li, W. Ma, L. Sun, and Y. Liu, “Understanding large language model based fuzz driver generation,” arXiv preprint arXiv:2307.12469, 2023.
- libfuzzer – a library for coverage-guided fuzz testing. [Online]. Available: https://llvm.org/docs/LibFuzzer.html
- How to use chat-based language models. [Online]. Available: https://platform.openai.com/docs/guides/chat/introduction
- H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M.-A. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, and T. Scialom, “Llama 2: Open foundation and fine-tuned chat models,” 2023.
- Y. Sun, S. Wang, S. Feng, S. Ding, C. Pang, J. Shang, J. Liu, X. Chen, Y. Zhao, Y. Lu, W. Liu, Z. Wu, W. Gong, J. Liang, Z. Shang, P. Sun, W. Liu, X. Ouyang, D. Yu, H. Tian, H. Wu, and H. Wang, “Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation,” 2021.
- Google’s bard. [Online]. Available: https://bard.google.com/
- T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” in Advances in Neural Information Processing Systems 35 (NIPS 2022), 2022.
- Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,” ACM Computing Surveys, vol. 55, no. 12, pp. 1–38, 2023.
- B. Bi, C. Wu, M. Yan, W. Wang, J. Xia, and C. Li, “Incorporating external knowledge into machine reading for generative question answering,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Nov. 2019, pp. 2521–2530.
- B. Peng, M. Galley, P. He, H. Cheng, Y. Xie, Y. Hu, Q. Huang, L. Liden, Z. Yu, W. Chen, and J. Gao, “Check your facts and try again: Improving large language models with external knowledge and automated feedback,” arXiv preprint arXiv:2302.12813, 2023.
- P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 9459–9474.
- H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, “Asleep at the keyboard? assessing the security of github copilot’s code contributions,” in 2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2022, pp. 754–768.
- G. Sandoval, H. Pearce, T. Nys, R. Karri, S. Garg, and B. Dolan-Gavitt, “Lost at c: A user study on the security implications of large language model code assistants,” in 32th USENIX Security Symposium (USENIX Security 23), 2023.
- J. Liu, C. S. Xia, Y. Wang, and L. Zhang, “Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation,” 2023.
- What makes a good fuzz target. [Online]. Available: https://github.com/google/fuzzing/blob/master/docs/good-fuzz-target.md
- S. Amann, H. A. Nguyen, S. Nadi, T. N. Nguyen, and M. Mezini, “A systematic evaluation of static api-misuse detectors,” IEEE Transactions on Software Engineering, vol. 45, no. 12, pp. 1170–1188, 2019.
- M. Wen, Y. Liu, R. Wu, X. Xie, S.-C. Cheung, and Z. Su, “Exposing library api misuses via mutation analysis,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019, pp. 866–877.
- T. Lv, R. Li, Y. Yang, K. Chen, X. Liao, X. Wang, P. Hu, and L. Xing, “Rtfm! automatic assumption discovery and verification derivation from library document for api misuse detection,” in Proceedings of the 2020 ACM SIGSAC conference on computer and communications security, 2020, pp. 1837–1852.
- S. Nielebock, R. Heumüller, J. Krüger, and F. Ortmeier, “Cooperative api misuse detection using correction rules,” in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results, 2020, p. 73–76.
- H. Zeng, J. Chen, B. Shen, and H. Zhong, “Mining api constraints from library and client to detect api misuses,” in 2021 28th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 2021, pp. 161–170.
- M. Kechagia, X. Devroey, A. Panichella, G. Gousios, and A. van Deursen, “Effective and efficient api misuse detection via exception propagation and search-based testing,” in Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2019, p. 192–203.
- S. Amann, H. A. Nguyen, S. Nadi, T. N. Nguyen, and M. Mezini, “A systematic evaluation of static api-misuse detectors,” IEEE Transactions on Software Engineering, vol. 45, no. 12, pp. 1170–1188, 2018.
- H. J. Kang and D. Lo, “Active learning of discriminative subgraph patterns for api misuse detection,” IEEE Transactions on Software Engineering, vol. 48, no. 8, pp. 2761–2783, 2021.
- S. Nielebock, P. Blockhaus, J. Krüger, and F. Ortmeier, “Automated change rule inference for distance-based api misuse detection,” arXiv preprint arXiv:2207.06665, 2022.
- J. Yang, J. Ren, and W. Wu, “Api misuse detection method based on transformer,” in 2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS). IEEE, 2022, pp. 958–969.
- K. Serebryany, D. Bruening, A. Potapenko, and D. Vyukov, “Addresssanitizer: A fast address sanity checker,” in Proceedings of the 2012 USENIX Conference on Annual Technical Conference, ser. USENIX ATC’12. USENIX Association, 2012, p. 28.
- Undefinedbehaviorsanitizer. [Online]. Available: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
- How to prepare the seed corpus for oss-fuzz. [Online]. Available: hhttps://google.github.io/oss-fuzz/getting-started/new-project-guide/#seed-corpus
- Honggfuzz. [Online]. Available: https://github.com/google/honggfuzz
- H. Green and T. Avgerinos, “Graphfuzz: Library api fuzzing with lifetime-aware dataflow graphs,” in 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022, pp. 1070–1081.
- J. Wang, B. Chen, L. Wei, and Y. Liu, “Superion: Grammar-aware greybox fuzzing,” in 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), 2019.
- Fuzzeddataprovider. [Online]. Available: https://github.com/llvm/llvm-project/blob/main/compiler-rt/include/fuzzer/FuzzedDataProvider.h
- Clang ast deserializer in rust. [Online]. Available: https://github.com/dtolnay/clang-ast
- Introducing chatgpt. [Online]. Available: https://openai.com/blog/chatgpt
- J. Jiang, H. Xu, and Y. Zhou, “Rulf: Rust library fuzzing via api dependency graph traversal,” in 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2021, pp. 581–592.
- Y. Lyu, W. Gao, S. Ma, Q. Sun, and J. Li, “Sparrowhawk: Memory safety flaw detection via data-driven source code annotation,” in Information Security and Cryptology: 17th International Conference, Inscrypt 2021, Virtual Event, August 12–14, 2021, Revised Selected Papers. Berlin, Heidelberg: Springer-Verlag, 2021, p. 129–148. [Online]. Available: https://doi.org/10.1007/978-3-030-88323-2_7
- Y. Lyu, Y. Fang, Y. Zhang, Q. Sun, S. Ma, E. Bertino, K. Lu, and J. Li, “Goshawk: Hunting memory corruptions via structure-aware and object-centric memory operation synopsis,” in 2022 2022 IEEE Symposium on Security and Privacy (SP) (SP). Los Alamitos, CA, USA: IEEE Computer Society, may 2022, pp. 1566–1566. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/SP46214.2022.00137
- V. Atlidakis, R. Geambasu, P. Godefroid, M. Polishchuk, and B. Ray, “Pythia: Grammar-based fuzzing of rest apis with coverage-guided feedback and learning-based mutations,” 2020.
- C. Lemieux, J. P. Inala, S. K. Lahiri, and S. Sen, “Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023, pp. 919–931.
- C. S. Xia, M. Paltenghi, J. L. Tian, M. Pradel, and L. Zhang, “Universal fuzzing via large language models,” 2023.
- J. Ackerman and G. Cybenko, “Large language models for fuzzing parsers (registered report),” in Proceedings of the 2nd International Fuzzing Workshop, ser. FUZZING 2023, 2023.
- Y. Deng, C. Xia, C. Yang, S. Zhang, S. Yang, and L. Zhang, “Large language models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries,” in 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE), 2024.