Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Revisiting Neural Program Smoothing for Fuzzing (2309.16618v1)

Published 28 Sep 2023 in cs.SE, cs.AI, and cs.CR

Abstract: Testing with randomly generated inputs (fuzzing) has gained significant traction due to its capacity to expose program vulnerabilities automatically. Fuzz testing campaigns generate large amounts of data, making them ideal for the application of ML. Neural program smoothing (NPS), a specific family of ML-guided fuzzers, aims to use a neural network as a smooth approximation of the program target for new test case generation. In this paper, we conduct the most extensive evaluation of NPS fuzzers against standard gray-box fuzzers (>11 CPU years and >5.5 GPU years), and make the following contributions: (1) We find that the original performance claims for NPS fuzzers do not hold; a gap we relate to fundamental, implementation, and experimental limitations of prior works. (2) We contribute the first in-depth analysis of the contribution of machine learning and gradient-based mutations in NPS. (3) We implement Neuzz++, which shows that addressing the practical limitations of NPS fuzzers improves performance, but that standard gray-box fuzzers almost always surpass NPS-based fuzzers. (4) As a consequence, we propose new guidelines targeted at benchmarking fuzzing based on machine learning, and present MLFuzz, a platform with GPU access for easy and reproducible evaluation of ML-based fuzzers. Neuzz++, MLFuzz, and all our data are public.

Revisiting Neural Program Smoothing for Fuzzing: Insights and Challenges

Neural Program Smoothing (NPS) has been a topic of interest for researchers aiming to augment traditional fuzzing techniques with ML approaches. The paper "Revisiting Neural Program Smoothing for Fuzzing" by Nicolae, Eisele, and Zeller offers a meticulous assessment of NPS-based fuzzing methodologies. Through an extensive quantitative and qualitative evaluation, the authors shed light on both the theoretical underpinnings and practical limitations of these techniques.

The primary focus of this paper is to evaluate the performance and viability of NPS-guided fuzzers, such as Neuzz and PreFuzz, against contemporary gray-box fuzzers, notably AFL and AFL++. The paper submits some striking findings; contrary to prior claims, NPS fuzzers generally underperform when compared to their gray-box counterparts. This performance gap is attributed to several factors, including the challenges intrinsic to the machine learning models employed within NPS frameworks.

Evaluation and Findings

The authors conduct a thorough empirical analysis, dedicating over 11 CPU years and 5.5 GPU years to evaluate various fuzzers across 23 software targets. Their key findings can be summarized as follows:

  1. ML Performance in NPS: The neural network models in NPS fuzzers face difficulty in learning effective coverage approximations. The trained models predominantly predict trivial coverage and struggle to capture rare edges, which are crucial for uncovering new software paths and potential vulnerabilities.
  2. Conceptual and Implementation Constraints: The paper identifies conceptual limitations within NPS methodologies, such as the inability of gradient-based mutations to target new edges efficiently, mainly because the models are trained only on already covered areas. Additionally, implementation issues like the reliance on outdated tools and programming practices (e.g., magic numbers) hinder usability and reproducibility.
  3. Comparison with Gray-Box Fuzzers: AFL++ and other traditional gray-box fuzzers surpass NPS-based approaches in terms of code coverage and bug-finding capabilities. The coverage metrics presented demonstrate that gray-box fuzzers achieve significantly higher code exploration, which correlates with a higher bug detection rate.
  4. Impact of Computational Resources: While GPU acceleration theoretically benefits NPS fuzzers by expediting ML training and mutation, the practical gains are marginal given the trivial nature of the trained models. This underscores the need for re-evaluating the complexity and effectiveness of ML models within fuzzing.

Implications and Future Directions

The implications of these findings are multifaceted. From a practical standpoint, the paper suggests that current NPS methodologies have limited applicability in real-world software testing scenarios due to their inefficiency and complexity. Theoretically, this calls into question the effectiveness of blending ML with fuzzing in its current form, urging for new approaches to integrate these domains.

Looking ahead, researchers are encouraged to explore novel methods for enhancing the integration of ML in fuzzing. This could involve advancements in modeling techniques to better capture edge coverage and leveraging more sophisticated ML algorithms capable of handling the inherent data imbalance and complexity in fuzzing datasets.

The authors also propose improved guidelines for benchmarking ML-enhanced fuzzers, emphasizing the need for robust experimental protocols and comprehensive evaluation metrics. These guidelines are crucial for future studies aiming to gauge the empirical performance of hybrid fuzzing solutions.

In summary, the paper provides a foundational critique and analysis of neural program smoothing for fuzzing, elaborating on both its current limitations and potential pathways for future research. As the field evolves, it remains vital to continuously challenge and refine the methodologies employed to ensure effective and efficient software testing solutions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org. Software available from tensorflow.org.
  2. Andrea Arcuri. 2010. It Does Matter How You Normalise the Branch Distance in Search Based Software Testing. In International Conference on Software Testing, Verification and Validation. 205–214. https://doi.org/10.1109/ICST.2010.17
  3. Comparing Fuzzers on a Level Playing Field with FuzzBench. In IEEE International Conference on Software Testing, Verification and Validation - Industry (ICST Industry). https://doi.org/10.1109/ICST53961.2022.00039
  4. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47, 2 (2002), 235–256. https://link.springer.com/article/10.1023/A:1013689704352
  5. Jana Aydinbas. 2022. AFLplusplus Persistence Mode README. https://github.com/AFLplusplus/AFLplusplus/blob/stable/instrumentation/README.persistent_mode.md. Accessed: 2023-05-10.
  6. On the Reliability of Coverage-Based Fuzzer Benchmarking. In International Conference on Software Engineering (ICSE). http://seclab.cs.sunysb.edu/lszekeres/Papers/ICSE22.pdf
  7. Deep Reinforcement Fuzzing. In IEEE Security and Privacy Workshops (SPW). 116–122. https://arxiv.org/abs/1801.04589
  8. Rich Caruana. 1997. Multitask Learning. Machine Learning 28 (1997), 41–75. https://doi.org/10.1023/A:1007379606734
  9. Determination of sample size in using central limit theorem for Weibull distribution. International journal of information and management sciences 17 (2006), 31–46.
  10. Continuity Analysis of Programs. In ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 57–70. https://doi.org/10.1145/1706299.1706308
  11. Swarat Chaudhuri and Armando Solar-Lezama. 2011. Smoothing a Program Soundly and Robustly. In Computer Aided Verification (CAV). https://www.cs.utexas.edu/~swarat/pubs/cav11.pdf
  12. Peng Chen and Hao Chen. 2018. Angora: Efficient Fuzzing by Principled Search. In IEEE Symposium on Security and Privacy (SP). https://web.cs.ucdavis.edu/~hchen/paper/chen2018angora.pdf
  13. EnFuzz: Ensemble Fuzzing with Seed Synchronization among Diverse Fuzzers. In USENIX Security Symposium (USENIX Security). https://www.usenix.org/system/files/sec19-chen-yuanliang.pdf
  14. Albert Danial. 2021. cloc: v1.92. https://doi.org/10.5281/zenodo.5760077
  15. Poetry developers. 2018. Python Poetry. https://python-poetry.org. Accessed: 2022-10-20.
  16. William Drozd and Michael D. Wagner. 2018. FuzzerGym: A Competitive Framework for Fuzzing and Learning. CoRR (2018). arXiv:1807.07490 http://arxiv.org/abs/1807.07490
  17. AFL++ best practices. https://aflplus.plus/docs/fuzzing_in_depth. Accessed: 2022-10-20.
  18. AFL++ : combining incremental steps of fuzzing research. In USENIX Workshop on Offensive Technologies (WOOT). USENIX Association. https://www.usenix.org/conference/woot20/presentation/fioraldi
  19. Learn&Fuzz: Machine learning for input fuzzing. In IEEE/ACM International Conference on Automated Software Engineering (ASE). https://doi.org/10.1109/ASE.2017.8115618
  20. Generative Adversarial Nets. In Advances in Neural Information Processing Systems (NIPS), Vol. 27. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf
  21. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1412.6572
  22. Google. 2017. Fuzzer test suite. https://github.com/google/fuzzer-test-suite. Accessed: 2022-10-20.
  23. Google. 2022. OSS-Fuzz. https://google.github.io/oss-fuzz/. Accessed: 2022-10-20.
  24. The elements of statistical learning: data mining, inference and prediction (2 ed.). Springer. http://www-stat.stanford.edu/~tibs/ElemStatLearn/
  25. GANFuzz: A GAN-Based Industrial Network Protocol Fuzzing Framework. In ACM International Conference on Computing Frontiers. 138–145. https://doi.org/10.1145/3203217.3203241
  26. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1412.6980
  27. Evaluating fuzz testing. In ACM SIGSAC Conference on Computer and Communications Security (CCS). 2123–2138.
  28. Fuzzing: a survey. Cybersecurity 1, 1 (2018), 1–13.
  29. DeepFuzz: Automatic Generation of Syntax Valid C Programs for Fuzz Testing, In AAAI Conference on Artificial Intelligence (AAAI). AAAI Conference on Artificial Intelligence 33, 1044–1051. https://doi.org/10.1609/aaai.v33i01.33011044
  30. LLVM. 2022. libFuzzer - a library for coverage-guided fuzz testing. https://llvm.org/docs/LibFuzzer.html. Accessed: 2022-10-20.
  31. Robert C. Martin and James O. Coplien. 2009. Clean code: a handbook of agile software craftsmanship. Prentice Hall. https://archive.org/details/cleancodehandboo00mart_843
  32. Dirk Merkel. 2014. Docker: lightweight linux containers for consistent development and deployment. Linux journal 2014, 239 (2014), 2.
  33. FuzzBench: An Open Fuzzer Benchmarking Platform and Service. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC). Association for Computing Machinery, New York, NY, USA, 1393–1403. https://doi.org/10.1145/3468264.3473932
  34. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems (NeurIPS), H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
  35. Not all bytes are equal: Neural byte sieve for fuzzing. CoRR (2017). https://arxiv.org/abs/1711.04596
  36. A Review of Machine Learning Applications in Fuzzing. CoRR (2019). https://arxiv.org/abs/1906.11133
  37. AddressSanitizer: A Fast Address Sanity Checker. In USENIX Conference on Annual Technical Conference (USENIX ATC). https://dl.acm.org/doi/10.5555/2342821.2342849
  38. MTFuzz: fuzzing with a multi-task neural network. In ACM Joint European Software Engineering Conference and Symposiumon the Foundations of Software Engineering (ESEC/FSE). https://dl.acm.org/doi/pdf/10.1145/3368089.3409723
  39. NEUZZ: efficient fuzzing with neural program smoothing. In IEEE Symposium on Security and Privacy (S&P). https://arxiv.org/abs/1807.05620
  40. Driller: Augmenting Fuzzing Through Selective Symbolic Execution. In NDSS. https://www.ndss-symposium.org/wp-content/uploads/2017/09/driller-augmenting-fuzzing-through-selective-symbolic-execution.pdf
  41. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems (NIPS), Vol. 27. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf
  42. Skyfire: Data-Driven Seed Generation for Fuzzing. In IEEE Symposium on Security and Privacy (SP). 579–594. https://doi.org/10.1109/SP.2017.23
  43. Be Sensitive and Collaborative: Analyzing Impact of Coverage Metrics in Greybox Fuzzing. In International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019). USENIX Association, 1–15.
  44. A systematic review of fuzzing based on machine learning techniques. CoRR (2019). https://arxiv.org/abs/1908.01262
  45. One Fuzzing Strategy to Rule Them All. In International Conference on Software Engineering (ICSE). Association for Computing Machinery, New York, NY, USA, 1634–1645. https://doi.org/10.1145/3510003.3510174
  46. Evaluating and improving neural program-smoothing-based fuzzing. In International Conference on Software Engineering (ICSE). http://zhangyuqun.com/publications/icse2022a.pdf
  47. Valentin Wüstholz and Maria Christakis. 2020. Targeted Greybox Fuzzing with Static Lookahead Analysis. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. Association for Computing Machinery, New York, NY, USA, 789–800. https://doi.org/10.1145/3377811.3380388
  48. QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing. In USENIX Security Symposium (USENIX Security). USENIX Association, 745–761. https://www.usenix.org/conference/usenixsecurity18/presentation/yun
  49. Michal Zalewski. 2017. American fuzzy lop. https://github.com/google/AFL.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Maria-Irina Nicolae (11 papers)
  2. Max Eisele (4 papers)
  3. Andreas Zeller (29 papers)
Citations (5)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com