Smart Greybox Fuzzing (1811.09447v1)

Published 23 Nov 2018 in cs.CR

Abstract: Coverage-based greybox fuzzing (CGF) is one of the most successful methods for automated vulnerability detection. Given a seed file (as a sequence of bits), CGF randomly flips, deletes or bits to generate new files. CGF iteratively constructs (and fuzzes) a seed corpus by retaining those generated files which enhance coverage. However, random bitflips are unlikely to produce valid files (or valid chunks in files), for applications processing complex file formats. In this work, we introduce smart greybox fuzzing (SGF) which leverages a high-level structural representation of the seed file to generate new files. We define innovative mutation operators that work on the virtual file structure rather than on the bit level which allows SGF to explore completely new input domains while maintaining file validity. We introduce a novel validity-based power schedule that enables SGF to spend more time generating files that are more likely to pass the parsing stage of the program, which can expose vulnerabilities much deeper in the processing logic. Our evaluation demonstrates the effectiveness of SGF. On several libraries that parse structurally complex files, our tool AFLSmart explores substantially more paths (up to 200%) and exposes more vulnerabilities than baseline AFL. Our tool AFLSmart has discovered 42 zero-day vulnerabilities in widely-used, well-tested tools and libraries; so far 17 CVEs were assigned.

Citations (187)

View on Semantic Scholar

Summary

The paper introduces Smart Greybox Fuzzing, which leverages structural mutation operators to generate valid test cases and double software path coverage.
The approach employs a validity-based power schedule that prioritizes structurally sound seeds to explore deeper execution paths efficiently.
Empirical evaluations show SGF uncovering twice as many vulnerabilities in complex file formats, highlighting its practical impact on software security.

Overview of Smart Greybox Fuzzing

The paper "Smart Greybox Fuzzing" presents an innovative approach to enhance the efficacy of greybox fuzzing, a widely-used technique in automated vulnerability detection within software applications. By integrating input-structure awareness into this methodology, the authors seek to overcome traditional limitations associated with Coverage-based Greybox Fuzzing (CGF), particularly when dealing with applications that process complex file formats.

Contributions and Methodology

The paper introduces Smart Greybox Fuzzing (SGF), a method that leverages high-level structural representations of input files to improve the generation of valid test cases, thereby facilitating deeper exploration of the application's input space. The core innovation lies in the definition of structural mutation operators that operate on the virtual file structure rather than the conventional bit level. This allows SGF to double the path coverage achieved by earlier methods while discovering more vulnerabilities, as evidenced by the authors' testing on various libraries.

Some notable contributions of SGF include:

Structural Mutation Operators: SGF introduces smart deletion, addition, and splicing mutation operators that preserve the structural integrity of the input files. These operators enhance the validity of the generated test cases, which increases the likelihood of exposing vulnerabilities deeply embedded within the application's processing logic.
Validity-based Power Schedule: This schedule allocates more energy to seeds with a higher degree of structural validity, hence prioritizing the generation of inputs that are more likely to traverse deeper paths within the application.
Deferred Parsing Optimization: By employing a probabilistic model that delays the parsing of inputs until necessary, SGF retains the efficiency of traditional greybox fuzzing, crucial for scalability.

Empirical Evaluation

The authors conducted extensive empirical evaluations on multiple software applications that process structured file formats such as PNG, JPEG, and WAV. Notably, AFL—a tool implementing SGF—discovered twice as many vulnerabilities compared to existing methods. The tool's robustness is evident in its performance across different benchmarks, including zero-day vulnerabilities discovery in libraries such as FFmpeg and WavPack. The paper further establishes that SGF is more effective than both AFL and Peach, which are representative of the traditional greybox and smart blackbox fuzzing techniques, respectively.

Implications and Future Directions

The structural integrity and validity-centric approach present in SGF promise substantial enhancements in automated fuzzing techniques, aiding in uncovering vulnerabilities in applications processing complex file formats. The integration with file format specifications and the reduced likelihood of generating invalid inputs illustrate SGF's potential in practical and large-scale software testing.

Future developments may focus on automating the creation of input specifications or integrating SGF with protocol specifications to extend its applicability to reactive systems. Furthermore, the potential for combining SGF with taint analysis to refine attribute-level mutations offers an intriguing avenue for research.

In summary, Smart Greybox Fuzzing represents a significant advancement in the field of software testing, marrying the efficiency and practicality of greybox approaches with input-aware methodologies traditionally seen in blackbox fuzzing, promoting deeper and more efficient testing of complex applications.

PDF Markdown