IssueMut: History-Based Compiler Fuzzing
- IssueMut is an automated methodology that extracts compiler fuzzing mutators from historical bug reports, informing targeted testing based on past compiler failures.
- It uses a multi-stage pipeline—including LLM-guided negative test derivation and AST-aware transformations—to synthesize mutators from bug report deltas.
- Large-scale evaluations on GCC and LLVM demonstrate enhanced bug discovery, with over 60 confirmed defects and unique crash triggers uncovered.
IssueMut is an automated methodology for extracting compiler fuzzing mutators from historical bug reports and integrating them into existing mutational compiler fuzzers. Its core insight is that fixed bug reports, containing both the triggering inputs and contextual explanation of failures, encode actionable knowledge about which program elements and syntactic/semantic constructs have historically exposed compiler defects. By mining this “bug history,” IssueMut produces targeted mutators that systematically direct the fuzzer towards regions of the input space where new and related bugs are more likely to occur. Large-scale evaluation demonstrates that these bug-history–derived mutators are effective at uncovering compiler defects missed by state-of-the-art fuzzers, with a significant number of these bugs subsequently confirmed and fixed by compiler maintainers (Liu et al., 9 Oct 2025).
1. Motivation and Rationale
Traditional mutational fuzzers for compilers employ a set of generic or expert-designed mutators—i.e., transformations that perturb seed programs to generate new test cases. However, this generic approach does not incorporate specific historical knowledge of prior compiler faults, resulting in a vast and often unproductive search space. Many generic mutations fail to induce substantive differences or to exercise subtle compiler logic. Conversely, each bug report in large software systems such as GCC and LLVM typically contains a “positive test case”—input that previously caused a crash or miscompilation—and a natural language description highlighting the code element responsible for the bug. These artifacts are underutilized in mainstream fuzzing efforts. IssueMut targets this gap by leveraging past bug reports to guide and amplify the fuzzing process through history-informed mutation.
2. Automated Mining of Mutators from Bug Histories
IssueMut applies a multi-stage, largely automated pipeline to extract and synthesize mutators from compiler bug histories:
- Bug Report Selection: Scrape all fixed bug reports from compiler bug trackers (e.g., GCC, LLVM) and filter for those containing minimal, reproducible positive test cases. Only fixed and reproducible bugs are used to ensure that the root cause and its associated input are well-understood.
- Negative Test Derivation: For each positive test, IssueMut automatically generates a negative variant that is similar in structure but does not trigger the historical bug. This is achieved using LLM agents (e.g., GPT-4o mini, Gemini 2.5 Pro through a LangChain agent system), guided by the bug’s textual description, to produce a mutation description and a patched (non-bug-triggering) test case.
- Mutator Synthesis: The delta between the negative and positive test cases defines the transformation that induces the bug. IssueMut synthesizes mutators, often as sed-like string replacement scripts or C/C++ AST transformations, that can convert the negative seed into the bug-triggering positive form. The synthesized mutators are validated by applying them to both the canonical test pair and a diverse sample of seed programs.
Mathematically, for a positive test and a negative , a valid mutator satisfies:
For simple cases, this leads to syscalls of the form:
1 |
sed -i -E 's/<pattern>/<replacement>/g' <file> |
3. Integration with Existing Fuzzing Frameworks
IssueMut is designed for modular integration with mutational fuzzers such as MetaMut. Technical integration aspects include:
- Extensible Mutator Interface: IssueMut’s bug-history mutators can be loaded into frameworks supporting plugin mutators, augmenting existing sets of generic, LLM-generated, or handcrafted transformations.
- AST-Aware and Context-Sensitive Capabilities: For deep mutations such as changing identifiers, types, or control-flow constructs, IssueMut auto-generates Clang AST visitor routines.
- Validation and Deduplication: To ensure efficacy and non-redundancy, IssueMut employs a validator agent that executes mutators on the original and synthetic test inputs, discarding those that do not consistently recreate the bug-inducing transformation. Deduplication uses aggregate hash comparison (e.g., SHA256 over a corpus sample) to cull duplicate or near-duplicate mutators, guaranteeing a diverse operator set with 99% confidence at a 1% margin of error.
4. Experimental Evaluation
Large-scale experiments were performed on GCC and LLVM:
- Bug Corpus: From 1457 fixed, reproducible GCC reports and 303 from LLVM, IssueMut mined 587 unique mutators, each derived from bug-triggering input transformations.
- Fuzzing Protocol: Mutators were incorporated into MetaMut and executed on GCC and LLVM using their full test suites as seed corpora. Runs were parallelized on high-core-count Linux servers with ample memory, and each fuzzing campaign had a 24-hour budget.
- Baselines: IssueMut’s approach was compared against Grammarinator, Kitten, and standard MetaMut (sans bug history mutators).
Results Summary
| Compiler | New Bugs Found by IssueMut | Confirmed or Fixed | Mutators Mined |
|---|---|---|---|
| GCC | 28 | ~60 | 587 |
| LLVM | 37 | (total) | 587 |
- Coverage: IssueMut mutators successfully triggered many unique compiler crashes that baseline fuzzers missed.
- Confirmation: Approximately 60 bugs found using IssueMut mutators were confirmed and fixed by maintainers.
- Complementarity: Only a minority of the bug-triggering test cases overlapped with those generated by generic or LLM-based mutators, demonstrating that bug-history mutators explore distinct, high-value regions of the input space.
5. Technical and Practical Significance
IssueMut’s central premise—that bug reports encode domain-specific fragilities not easily found by generic mutation—was empirically validated by discovering non-overlapping bug sets and higher rates of confirmed issue reports. By mining transformations that mirrored previously encountered compiler failures, IssueMut both increases the efficiency of fuzzing (by targeting known failure regions) and introduces the human-in-the-loop knowledge (as encoded in resolved bug reports) into automated toolchains.
Notably, IssueMut streamlines the labor of manually designing mutators, adapts quickly to new language constructs (e.g., C23 features), and provides a scalable mechanism for continuously evolving fuzzing strategies as compiler ecosystems mature.
6. Limitations and Future Work
- Mining Generalizability: While IssueMut automates mutator creation for most bug reports with positive tests, certain bug patterns remain challenging to generalize, especially when context dependence or complex program state is involved.
- LLM Reliance: The negative test derivation depends on the reliability of the LLMs; further improvements in prompt engineering and agent validation could enhance mutator utility.
- Language Scope: Current evaluation focused on C/C++ compilers; extending to other languages (e.g., Rust, Swift) is an open direction.
- Synthesis and Search Synergy: Combining IssueMut’s history-based mutators with coverage-guided search or differential testing may further accelerate unique bug discovery.
7. Implications for Compiler and Fuzzer Design
IssueMut demonstrates that systematic mining of historical bug data can substantially increase the yield of compiler fuzzing. The results indicate that existing fuzzer frameworks benefit from hybridization—integrating history-derived mutators, LLM-generated strategies, and coverage-oriented mutation. This approach motivates a new paradigm where human knowledge, as manifest in bug resolution artifacts, is continually recycled into automated testing pipelines, facilitating more robust evolution of compiler toolchains and serving as a model for similar applications in other critical software infrastructure domains.