Revisiting Neural Program Smoothing for Fuzzing (2309.16618v1)

Published 28 Sep 2023 in cs.SE, cs.AI, and cs.CR

Abstract: Testing with randomly generated inputs (fuzzing) has gained significant traction due to its capacity to expose program vulnerabilities automatically. Fuzz testing campaigns generate large amounts of data, making them ideal for the application of ML. Neural program smoothing (NPS), a specific family of ML-guided fuzzers, aims to use a neural network as a smooth approximation of the program target for new test case generation. In this paper, we conduct the most extensive evaluation of NPS fuzzers against standard gray-box fuzzers (>11 CPU years and >5.5 GPU years), and make the following contributions: (1) We find that the original performance claims for NPS fuzzers do not hold; a gap we relate to fundamental, implementation, and experimental limitations of prior works. (2) We contribute the first in-depth analysis of the contribution of machine learning and gradient-based mutations in NPS. (3) We implement Neuzz++, which shows that addressing the practical limitations of NPS fuzzers improves performance, but that standard gray-box fuzzers almost always surpass NPS-based fuzzers. (4) As a consequence, we propose new guidelines targeted at benchmarking fuzzing based on machine learning, and present MLFuzz, a platform with GPU access for easy and reproducible evaluation of ML-based fuzzers. Neuzz++, MLFuzz, and all our data are public.

PDF HTML Abstract

Revisiting Neural Program Smoothing for Fuzzing: Insights and Challenges

Neural Program Smoothing (NPS) has been a topic of interest for researchers aiming to augment traditional fuzzing techniques with ML approaches. The paper "Revisiting Neural Program Smoothing for Fuzzing" by Nicolae, Eisele, and Zeller offers a meticulous assessment of NPS-based fuzzing methodologies. Through an extensive quantitative and qualitative evaluation, the authors shed light on both the theoretical underpinnings and practical limitations of these techniques.

The primary focus of this paper is to evaluate the performance and viability of NPS-guided fuzzers, such as Neuzz and PreFuzz, against contemporary gray-box fuzzers, notably AFL and AFL++. The paper submits some striking findings; contrary to prior claims, NPS fuzzers generally underperform when compared to their gray-box counterparts. This performance gap is attributed to several factors, including the challenges intrinsic to the machine learning models employed within NPS frameworks.

Evaluation and Findings

The authors conduct a thorough empirical analysis, dedicating over 11 CPU years and 5.5 GPU years to evaluate various fuzzers across 23 software targets. Their key findings can be summarized as follows:

ML Performance in NPS: The neural network models in NPS fuzzers face difficulty in learning effective coverage approximations. The trained models predominantly predict trivial coverage and struggle to capture rare edges, which are crucial for uncovering new software paths and potential vulnerabilities.
Conceptual and Implementation Constraints: The paper identifies conceptual limitations within NPS methodologies, such as the inability of gradient-based mutations to target new edges efficiently, mainly because the models are trained only on already covered areas. Additionally, implementation issues like the reliance on outdated tools and programming practices (e.g., magic numbers) hinder usability and reproducibility.
Comparison with Gray-Box Fuzzers: AFL++ and other traditional gray-box fuzzers surpass NPS-based approaches in terms of code coverage and bug-finding capabilities. The coverage metrics presented demonstrate that gray-box fuzzers achieve significantly higher code exploration, which correlates with a higher bug detection rate.
Impact of Computational Resources: While GPU acceleration theoretically benefits NPS fuzzers by expediting ML training and mutation, the practical gains are marginal given the trivial nature of the trained models. This underscores the need for re-evaluating the complexity and effectiveness of ML models within fuzzing.

Implications and Future Directions

The implications of these findings are multifaceted. From a practical standpoint, the paper suggests that current NPS methodologies have limited applicability in real-world software testing scenarios due to their inefficiency and complexity. Theoretically, this calls into question the effectiveness of blending ML with fuzzing in its current form, urging for new approaches to integrate these domains.

Looking ahead, researchers are encouraged to explore novel methods for enhancing the integration of ML in fuzzing. This could involve advancements in modeling techniques to better capture edge coverage and leveraging more sophisticated ML algorithms capable of handling the inherent data imbalance and complexity in fuzzing datasets.

The authors also propose improved guidelines for benchmarking ML-enhanced fuzzers, emphasizing the need for robust experimental protocols and comprehensive evaluation metrics. These guidelines are crucial for future studies aiming to gauge the empirical performance of hybrid fuzzing solutions.

In summary, the paper provides a foundational critique and analysis of neural program smoothing for fuzzing, elaborating on both its current limitations and potential pathways for future research. As the field evolves, it remains vital to continuously challenge and refine the methodologies employed to ensure effective and efficient software testing solutions.