Syzkaller Specifications
- Syzkaller specifications are formally structured syzlang descriptions that define syscall syntax, semantics, arguments, and inter-call dependencies.
- They enable automated syscall sequence generation and mutation through iterative LLM inference with KernelGPT, enhancing both code coverage and bug discovery.
- Empirical evaluations show that these specifications can increase line coverage by up to 21% and discover 28% more crashes in fuzzing runs.
Syzkaller specifications are formally structured descriptions written in a custom domain-specific language, syzlang, which define the syntax, semantics, arguments, return types, and inter-call dependencies of system calls (syscalls) for the Syzkaller kernel fuzzer. These specifications enable automated generation and mutation of valid syscall sequences, thereby allowing Syzkaller to test kernel code for correctness and security vulnerabilities efficiently. Recent research, notably by Yang et al. (2024), introduced KernelGPT, a method that harnesses LLMs for the automatic inference, validation, and integration of new syscall specifications, significantly improving both line coverage and bug-finding rates (Yang et al., 2023).
1. Structure and Semantics of syzlang
Syzkaller specifications use syzlang, a concise domain-specific language, to capture the formal interface of each syscall. The language supports the declaration of:
- Syscall definitions (with optional instances), argument lists, return types, and parameter annotations.
- Supported types include primitive integer/floating/enum types, pointers, arrays (with fixed or variable lengths), resources, structs, and type aliases.
- Field and parameter annotations for argument directionality:
in,out, andinout. - Explicit inter-call dependencies, especially for
resourceparameters (which must be produced by preceding syscalls).
The core BNF grammar is:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
<Spec> ::= <SyscallDef>*
<SyscallDef> ::= “sys” <Name> [“$” <Inst>] “(” [<ParamList>] “)” [“->” <Type>] “{” <FieldSpecs> “}”
<ParamList> ::= <Param> (“,” <Param>)*
<Param> ::= <Type> <Identifier> [“=” <ConstExpr>] [“<” <Annot> “>”]
<Annot> ::= “in” | “out” | “inout”
<FieldSpecs> ::= (<Identifier> “.” <Identifier> “:” <Annot> [“[” <ConstExpr> “]”])*
<Type> ::= “int” | “uint” | “long” | “flags” “[” <EnumName> “]”
| “ptr” “[” <Type> “]”
| “array” “[” <Type> [“,” <ConstExpr>] “]”
| “resource” “[” <ResName> “]”
| <StructName> “_struct”
| <Identifier>
<ConstExpr> ::= INTEGER_LITERAL | <Identifier> | <Identifier> “+” <Integer> | … |
Key semantic constraints enforce that:
resourcearguments are data dependencies on earlier syscalls.out/inout-annotated fields must correspond to kernel structs.- Arrays have fixed or explicitly declared variable lengths; the latter must reference a separate length parameter (e.g.,
len[devices]).
2. Automated Inference via KernelGPT
KernelGPT introduces an iterative LLM-driven workflow for automatic specification synthesis, consisting of three phases:
- Driver Detection: Locates device operation handlers (e.g., through LLVM-based pattern search for
fopsstructs), extracts C struct definitions and usage sites, and prompts the LLM (with code context and concrete examples) to propose correct syzlang syscall instances (e.g.,openat$dm_control("/dev/mapper/control")</code>).</li> <li><strong>Specification Generation</strong>: For each candidate driver and ioctl handler: <ul> <li>Command values, argument types, and needed struct/type definitions are inferred via recursive LLM queries on progressively broader code slices.</li> <li>Algorithm 1 (presented below) formalizes this recursive process:</li> </ul></li> </ol> <p>$\begin{algorithm}[H] \Input{related source code~, usage info~, iteration~} \Output{inferred spec fragment~ or \bot~if~failed} \If{$k > \maxiter$}{\Return \bot} Prompt \leftarrow BuildPrompt(S,U)\; (R,\; \mathrm{UNKNOWN}) \leftarrow QueryLLM(Prompt)\; \If{\mathrm{UNKNOWN} = \emptyset}{\Return R} \ForEach{entry~(f,t,u) \in \mathrm{UNKNOWN}}{ S' \leftarrow ExtractCode(f,t)\; R' \leftarrow \Call{Analyze}{S',u,k+1}\; R \leftarrow Update(R,R')\; } \Return R\; \end{algorithm}d\mathrm{Cov}_{\mathrm{LLM}}(d) = \# \text{lines covered by Syzkaller with LLM-generated specs}\mathrm{Cov}_{\mathrm{base}}(d) = \# \text{lines covered by baseline specs}\Delta_{\mathrm{cov}}(d) = \frac{\mathrm{Cov}_{\mathrm{LLM}}(d) - \mathrm{Cov}_{\mathrm{base}}(d)}{\mathrm{Cov}_{\mathrm{base}}(d)} \times 100\%n\Delta_{\mathrm{cov}}^{\mathrm{all}} = \frac{\sum_d \mathrm{Cov}_{\mathrm{LLM}}(d) - \sum_d \mathrm{Cov}_{\mathrm{base}}(d)} {\sum_d \mathrm{Cov}_{\mathrm{base}}(d)} \times 100\%$Aggregate results show that newly added specifications cover 6668 unique lines (5% of Syzkaller’s baseline of 143,838 lines). Integrated fuzzing using 129 new call descriptions plus 3912 existing ones finds 28% more crashes during a 24-hour run. On existing drivers, KernelGPT yields 21% higher line coverage than the prior art, SyzDescribe.
4. Specification Quality and Correction: Empirical Examples
The KernelGPT pipeline produces valid and executable syscall specifications at a substantially increased rate compared to earlier methods. Its validation and repair loop resolves common errors in initial LLM outputs, often arising from misuse of syzlang-specific rules regarding struct fields or array lengths.
Correctness before and after KernelGPT repair:
Example Before (incorrect/invalid) After (KernelGPT repaired) Device-mapper dm_ctl_ioctlstructIncorrect nodename; incomplete/misannotated fields; fixed-length array misused Accurate nodename, correct use of device fd, correct array and output annotation conventions vfio_pci_hot_reset_infostructVariable array lengths not permitted; invalid field references Constant for fixed arrays, var-length arrays use separate lenparameter compliant with syzlangKernelGPT's iterative repair and validation achieve high post-repair validity and executability rates, as summarized in experimental tables:
#Drivers #Generated #Valid #Valid after Repair #Executable 50 39/50 24/39 32/39 17/32 5. Integration with Syzkaller Fuzzing Infrastructure
Validated
.syzspecification files generated or repaired by KernelGPT are incorporated into the Syzkaller repository (sys/linux/or architecture-specific directories). The integration machinery includes:make extracttriggers Syzkaller'ssyz-extractfor grammar and reference checking.syz-managerandsyz-fuzzeringest new syscall descriptions for routine corpus expansion, sampling, and mutation.- During fuzzing, all calls—built-in and generated—are treated homogeneously by the scheduler and mutator.
- KernelGPT-generated specifications have been merged upstream into the official Syzkaller repository on developer request.
A plausible implication is that this integration pipeline establishes a scalable, maintainable process for extending kernel model coverage as new drivers and syscalls appear.
6. Empirical Findings: Coverage, Bugs, and Comparative Effectiveness
Empirical evaluation demonstrates substantial gains in coverage and unique bug discovery:
- Executable new drivers achieved, for 129 calls, coverage of 90,365 lines and 6,668 unique coverage lines in 8-hour fuzzing runs.
- Comparison across ten “existing” drivers yields 21% higher coverage than SyzDescribe and 12% higher than Syzkaller's own legacy corpus.
- KernelGPT-augmented Syzkaller found 24 new unique kernel bugs, with 12 fixed and 11 assigned CVEs.
Handler/Metric #Calls Cov Unique Cov btrfs_control_ioctl 4 2719 20 cec_ioctl 12 3643 402 ... ... ... ... Total 129 90365 6668 Kernel bugs detected and attributed to the new specifications include memory allocation bugs and use-after-free errors, corroborating the significance of specification coverage.
7. Limitations and Prospects for Further Development
Identified limitations include:
- LLM context window capacity: Complex handlers exceeding the GPT-4 window limit sometimes result in inference failures.
- Elaborate, multi-stage driver initialization—in particular for network and USB subsystems—remains to be addressed in future iterations.
- KernelGPT presently does not perform explicit candidate ranking, though coverage- or signature-guided pruning could increase throughput.
- Unexplored dimensions include LLM-driven seed selection, on-the-fly syscall synthesis, cross-driver dependency inference, and handling of closed-source modules.
These limitations motivate ongoing research in integrating LLMs more deeply with program analysis, symbolic execution, and fuzzing seed/mutation strategies for improved automation and generality (Yang et al., 2023).
References (1)