FuzzCoder: Byte-level Fuzzing Test via Large Language Model (2409.01944v1)

Published 3 Sep 2024 in cs.CL

Abstract: Fuzzing is an important dynamic program analysis technique designed for finding vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and the best approaches often apply uniform random mutations to pre-existing valid inputs. In this work, we propose to adopt fine-tuned LLMs (FuzzCoder) to learn patterns in the input files from successful attacks to guide future fuzzing explorations. Specifically, we develop a framework to leverage the code LLMs to guide the mutation process of inputs in fuzzing. The mutation process is formulated as the sequence-to-sequence modeling, where LLM receives a sequence of bytes and then outputs the mutated byte sequence. FuzzCoder is fine-tuned on the created instruction dataset (Fuzz-Instruct), where the successful fuzzing history is collected from the heuristic fuzzing tool. FuzzCoder can predict mutation locations and strategies locations in input files to trigger abnormal behaviors of the program. Experimental results show that FuzzCoder based on AFL (American Fuzzy Lop) gain significant improvements in terms of effective proportion of mutation (EPM) and number of crashes (NC) for various input formats including ELF, JPG, MP3, and XML.

PDF Abstract

Overview of "FuzzCoder: Byte-level Fuzzing Test via LLM"

"FuzzCoder: Byte-level Fuzzing Test via LLM" introduces an innovative approach to dynamic program analysis, specifically focusing on fuzzing—a method used to uncover vulnerabilities by subjecting software to malformed inputs. The paper proposes a framework that leverages LLMs to enhance the efficiency and effectiveness of byte-level fuzzing through an intelligent mutation process.

Summary and Key Contributions

The paper presents FuzzCoder, a tool designed to improve the process of input mutation, which is crucial for effective fuzzing. FuzzCoder's framework is underpinned by LLMs fine-tuned on a dataset named Fuzz-Instruct. This dataset is carefully curated from successful fuzzing histories, aiding the LLM in learning patterns and strategies for input mutations that are more likely to expose software vulnerabilities.

Three main contributions are highlighted:

Sequence-to-Sequence Modeling for Input Mutation:
- The mutation process is framed as a sequence-to-sequence task, where the LLM receives a sequence of bytes and outputs a mutated byte sequence. This approach allows the model to predict both the location and type of mutations.
Fuzz-Instruct Dataset Construction:
- A comprehensive dataset is created by collecting mutation instances from heuristic fuzzing tools. This dataset is used to fine-tune the LLMs, enhancing their ability to predict effective mutations for various input formats, such as ELF, JPG, MP3, and XML.
Fuzz-Bench Evaluation Framework:
- FuzzCoder is evaluated using a newly constructed benchmark—Fuzz-Bench—consisting of eight programs. The results indicate significant improvements in metrics such as effective proportion of mutation (EPM) and number of crashes (NC) compared to traditional fuzzing methods.

Methodology

Mutation Process as Sequence-to-Sequence Modeling

The paper positions the mutation process within the context of sequence-to-sequence modeling. This involves converting the data into byte sequences and leveraging LLMs to predict where and how to mutate these bytes to maximize the likelihood of triggering software vulnerabilities. The LLMs are fine-tuned on the Fuzz-Instruct dataset, allowing them to understand and generate byte-level data effectively.

Fuzz-Instruct Dataset

Fuzz-Instruct is a collected corpus formed from the successful mutation instances recorded from heuristic fuzzing tools like AFL (American Fuzzy Lop). Each entry in this dataset consists of an original input sequence paired with its successfully mutated counterpart, providing valuable training data for the LLMs.

Evaluation with Fuzz-Bench

FuzzCoder was extensively evaluated using the Fuzz-Bench benchmark, which comprises programs such as NM_ELF, READ_ELF, OBJDUMP_ELF, and others. These programs accept a variety of input formats and represent different domains of software applications.

Experimental Results

Effective Proportion of Mutation (EPM): FuzzCoder outperformed baseline methods (AFL with heuristic and small models) in the EPM metric across all eight programs. Notably, the CodeQwen and DeepSeek-Coder models consistently achieved higher EPM.
Number of Crashes (NC): FuzzCoder demonstrated a higher number of crashes compared to baseline methods, indicating that it can uncover more vulnerabilities. For instance, in the READ_ELF benchmark, FuzzCoder with CodeQwen achieved nine crashes compared to zero crashes for the baselines.
Code Coverage: FuzzCoder showed superior performance in terms of line, branch, and function coverage in comparison to traditional methods, demonstrating its efficacy in exploring a broader range of execution paths within the target programs.

Implications and Future Directions

The introduction of FuzzCoder has several implications for both practical and theoretical domains in fuzz testing:

Practical Implications:
- By integrating LLMs into fuzzing workflows, FuzzCoder significantly enhances the ability to detect software vulnerabilities efficiently.
- The framework can be adapted to various fuzzing tools and input formats, making it a versatile tool in the software testing arsenal.
Theoretical Implications:
- The sequence-to-sequence modeling approach offers a novel perspective on the fuzzing process, potentially opening new avenues for research in dynamic program analysis and input mutation strategies.
- The success of fine-tuning domain-specific LLMs on custom datasets like Fuzz-Instruct provides a blueprint for constructing more domain-focused models in other areas of software engineering.

Conclusion

This comprehensive paper encapsulates a significant advancement in fuzz testing methodologies by exploiting the power of LLMs. FuzzCoder proposes an effective and efficient framework for input mutation, demonstrated through rigorous evaluation on the Fuzz-Bench benchmark. This work charts a promising path for future developments in AI-driven software vulnerability detection, highlighting the potential of LLMs to revolutionize this critical field.

PDF Markdown Bookmark Chat (Pro)

References (34)

Authors (16)

Liqun Yang (18 papers)
Jian Yang (505 papers)
Chaoren Wei (4 papers)
Guanglin Niu (20 papers)
Ge Zhang (170 papers)
Yunli Wang (13 papers)
Linzheng Chai (16 papers)
Wanxu Xia (1 paper)
Hongcheng Guo (39 papers)
Shun Zhang (105 papers)
Jiaheng Liu (100 papers)
Yuwei Yin (21 papers)
Junran Peng (30 papers)
Jiaxin Ma (6 papers)
Liang Sun (124 papers)
Zhoujun Li (122 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/Challenging666/status/1832315035151995152

https://twitter.com/arXivGPT/status/1832496273259425907

https://twitter.com/GptMaestro/status/1832588872305537401