Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

KernelGPT: Enhanced Kernel Fuzzing via Large Language Models (2401.00563v3)

Published 31 Dec 2023 in cs.CR, cs.AI, and cs.SE

Abstract: Bugs in operating system kernels can affect billions of devices and users all over the world. As a result, a large body of research has been focused on kernel fuzzing, i.e., automatically generating syscall (system call) sequences to detect potential kernel bugs or vulnerabilities. Kernel fuzzing aims to generate valid syscall sequences guided by syscall specifications that define both the syntax and semantics of syscalls. While there has been existing work trying to automate syscall specification generation, this remains largely manual work, and a large number of important syscalls are still uncovered. In this paper, we propose KernelGPT, the first approach to automatically synthesizing syscall specifications via LLMs for enhanced kernel fuzzing. Our key insight is that LLMs have seen massive kernel code, documentation, and use cases during pre-training, and thus can automatically distill the necessary information for making valid syscalls. More specifically, KernelGPT leverages an iterative approach to automatically infer the specifications, and further debug and repair them based on the validation feedback. Our results demonstrate that KernelGPT can generate more new and valid specifications and achieve higher coverage than state-of-the-art techniques. So far, by using newly generated specifications, KernelGPT has already detected 24 new unique bugs in Linux kernel, with 12 fixed and 11 assigned with CVE numbers. Moreover, a number of specifications generated by KernelGPT have already been merged into the kernel fuzzer Syzkaller, following the request from its development team.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. GPT4 Turbo. https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo.
  2. Syzbot. https://syzkaller.appspot.com/upstream/.
  3. Syzkaller. https://github.com/google/syzkaller/.
  4. syzlang. https://github.com/google/syzkaller/blob/master/docs/syscall_descriptions_syntax.md.
  5. The LLVM Compiler Infrastructure. https://llvm.org.
  6. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023 (2023).
  7. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  8. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
  9. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
  10. Syzgen: Automated generation of syscall specification of closed-source macos drivers. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (New York, NY, USA, 2021), CCS ’21, Association for Computing Machinery, p. 749–763.
  11. Difuze: Interface aware fuzzing for kernel drivers. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (2017), pp. 2123–2138.
  12. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (2023), ISSTA 2023.
  13. Large language models are edge-case fuzzers: Testing deep learning libraries via fuzzgpt. arXiv preprint arXiv:2304.02014 (2023).
  14. Automated testing of graphics shader compilers. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 1–29.
  15. Bugs as deviant behavior: A general approach to inferring errors in systems code. SIGOPS Oper. Syst. Rev. 35, 5 (oct 2001), 57–72.
  16. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).
  17. {{\{{ACTOR}}\}}:{{\{{Action-Guided}}\}} kernel fuzzing. In 32nd USENIX Security Symposium (USENIX Security 23) (2023), pp. 5003–5020.
  18. Group., N. Triforce Linux Syscall Fuzzer. https://github.com/nccgroup/TriforceLinuxSyscallFuzzer.
  19. Imf: Inferred model-based fuzzer. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (New York, NY, USA, 2017), CCS ’17, Association for Computing Machinery, p. 2345–2358.
  20. Syzdescribe: Principled, automated, static generation of syscall descriptions for kernel drivers. In 2023 IEEE Symposium on Security and Privacy (SP) (2023), IEEE Computer Society, pp. 3262–3278.
  21. Augmenting greybox fuzzing with generative ai. arXiv preprint arXiv:2306.06782 (2023).
  22. Jones, D. Trinity. https://github.com/kernelslacker/trinity.
  23. Thunderkaller: Profiling and improving the performance of syzkaller. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2023), pp. 1567–1578.
  24. Compiler validation via equivalence modulo inputs. ACM Sigplan Notices 49, 6 (2014), 216–226.
  25. Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In International conference on software engineering (ICSE) (2023).
  26. Assisting static analysis with large language models: A chatgpt experiment. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2023), pp. 2107–2111.
  27. Starcoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023).
  28. Random testing for c and c++ compilers with yarpgen. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 1–25.
  29. Large language model guided protocol fuzzing. In Proceedings of the 31st Annual Network and Distributed System Security Symposium (NDSS) (2024).
  30. Learning deep semantics for test completion. arXiv preprint arXiv:2302.10166 (2023).
  31. OpenAI. Chatgpt. https://openai.com/blog/chatgpt.
  32. OpenAI. Gpt-4 technical report, 2023.
  33. Oracle. Kernel-Fuzzing. https://github.com/oracle/kernel-fuzzing.
  34. Oswal, P. B. Improving Linux Kernel Fuzzing. PhD thesis, 2023.
  35. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  36. {{\{{MoonShine}}\}}: Optimizing {{\{{OS}}\}} fuzzer seed selection with trace distillation. In 27th USENIX Security Symposium (USENIX Security 18) (2018), pp. 729–743.
  37. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).
  38. Adaptive test generation using a large language model. arXiv preprint arXiv:2302.06527 (2023).
  39. KSG: Augmenting kernel fuzzing with system call specification generation. In 2022 USENIX Annual Technical Conference (USENIX ATC 22) (Carlsbad, CA, July 2022), USENIX Association, pp. 351–366.
  40. Healer: Relation learning guided kernel fuzzing. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (2021), pp. 344–358.
  41. Fuzzing: brute force vulnerability discovery. Pearson Education, 2007.
  42. Syzdirect: Directed greybox fuzzing for linux kernel. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (New York, NY, USA, 2023), CCS ’23, Association for Computing Machinery, p. 1630–1644.
  43. The Linux Kernel documentation. Hdmi cec. https://docs.kernel.org/admin-guide/media/cec.html, 2023.
  44. {{\{{SyzVegas}}\}}: Beating kernel fuzzing odds with reinforcement learning. In 30th USENIX Security Symposium (USENIX Security 21) (2021), pp. 2741–2758.
  45. Free lunch for testing: Fuzzing deep-learning libraries from open source. In Proceedings of the 44th International Conference on Software Engineering (2022), pp. 995–1007.
  46. Magicoder: Source code is all you need. arXiv preprint arXiv:2312.02120 (2023).
  47. Wikipedia contributors. Device mapper — Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Device_mapper&oldid=1146533552, 2023. [Online; accessed 28-December-2023].
  48. Universal fuzzing via large language models. arXiv preprint arXiv:2308.04748 (2023).
  49. Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using chatgpt. arXiv preprint arXiv:2304.00385 (2023).
  50. A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (2022), pp. 1–10.
  51. White-box compiler fuzzing empowered by large language models. arXiv preprint arXiv:2310.15991 (2023).
  52. Finding and understanding bugs in c compilers. In Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation (2011), pp. 283–294.
  53. No more manual tests? evaluating and improving chatgpt for unit test generation. arXiv preprint arXiv:2305.04207 (2023).
  54. The fuzzing book, 2019.
Citations (6)

Summary

  • The paper introduces KernelGPT, which automates syscall specification generation using LLMs to enhance kernel fuzzing.
  • It employs an iterative process with driver detection, specification generation, and validation, achieving 129 new syscall descriptions and 21.3% increased coverage.
  • KernelGPT outperforms manual methods by uncovering previously unreported bugs and significantly strengthening Linux kernel security testing.

KernelGPT: Advancing Kernel Fuzzing with LLMs

The paper "KernelGPT: Enhanced Kernel Fuzzing via LLMs" presents a pioneering approach to improving kernel fuzzing by leveraging the capabilities of LLMs. Specifically, it introduces KernelGPT, a novel methodology for automatically inferring Syzkaller specifications using LLMs. This approach addresses the current limitations in kernel fuzzing, which rely heavily on manual processes to generate extensive syscall sequences and specifications, a task that is both labor-intensive and prone to errors due to the constantly evolving nature of kernel codebases.

Overview of Kernel Fuzzing and Syzkaller

Kernel fuzzing is an essential technique for uncovering potential bugs in operating system kernels, which are foundational to system stability and security. Syzkaller has emerged as one of the most proficient kernel fuzzers, utilizing a domain-specific language called syzlang to define syscall sequences. Despite its efficacy, the generation of Syzkaller specifications is largely manual and suffers from incomplete coverage of syscalls, notably in complex areas like device drivers.

Contributions of KernelGPT

KernelGPT distinguishes itself as the first approach harnessing the analytical prowess of LLMs to automate the generation of syscall specifications. The methodology outlined in the paper involves an iterative analysis process where LLMs are employed to distill vast kernel-related datasets into accurate and useful syscall specifications. KernelGPT's procedures are broken down into driver detection, specification generation, and specification validation and repair.

  1. Driver Detection: Utilizing LLMs to infer device names and discern initialization descriptions from device operation handlers by leveraging code references.
  2. Specification Generation: An iterative process where LLMs analyze related source code to deduce command values and argument types for ioctl handlers. The process is segmented into stages to allow LLMs to focus on discrete subtasks, thus improving type analysis and synthesis capabilities.
  3. Specification Validation and Repair: This phase involves using validation feedback to identify and rectify errors, ensuring the accuracy of the generated specifications.

Empirical Findings

The authors conducted a comprehensive evaluation of KernelGPT on the Linux kernel, version 6.7. The tool was able to generate valid and executable specifications for various undescribed drivers, yielding an additional 129 syscall descriptions and achieving 21.3% more line coverage in fuzzing tests compared to baseline methods.

KernelGPT was also able to reveal eight crashes in new drivers, with seven of these being previously unreported bugs. These findings provide implicit validation of the improved capacity for testing and bug identification that KernelGPT facilitates, as opposed to manually generated specifications.

Comparative Analysis and Implications

When compared with contemporary methods like SyzDescribe and existing Syzkaller specifications, KernelGPT exhibited superior performance in coverage metrics and type analysis for the selected drivers. The specifications produced by KernelGPT contributed to both higher coverage numbers and effective bug identification, underscoring the importance and potential impact of automating specification generation through LLMs.

Future Directions

KernelGPT's approach opens promising avenues for further research in the integration of LLMs with kernel fuzzing techniques. Future work could explore more intricate and diverse driver settings, as well as the adoption of KernelGPT in generating specifications directly from binary codebases. Additionally, there is potential for expanding the application's scope to incorporate LLM-generated seeds and mutations, enhancing the fuzzing process's depth and breadth.

In conclusion, this paper substantiates the feasibility and effectiveness of integrating LLMs into the kernel fuzzing domain, transforming a traditionally manual task into a more automated and efficient process. This advancement holds substantial promise for improving system security and reliability through heightened bug detection capabilities.

X Twitter Logo Streamline Icon: https://streamlinehq.com