Papers
Topics
Authors
Recent
2000 character limit reached

KEN: Kernel Extensions using Natural Language (2312.05531v1)

Published 9 Dec 2023 in cs.AI and cs.OS

Abstract: The ability to modify and extend an operating system is an important feature for improving a system's security, reliability, and performance. The extended Berkeley Packet Filters (eBPF) ecosystem has emerged as the standard mechanism for extending the Linux kernel and has recently been ported to Windows. eBPF programs inject new logic into the kernel that the system will execute before or after existing logic. While the eBPF ecosystem provides a flexible mechanism for kernel extension, it is difficult for developers to write eBPF programs today. An eBPF developer must have deep knowledge of the internals of the operating system to determine where to place logic and cope with programming limitations on the control flow and data accesses of their eBPF program enforced by the eBPF verifier. This paper presents KEN, an alternative framework that alleviates the difficulty of writing an eBPF program by allowing Kernel Extensions to be written in Natural language. KEN uses recent advances in LLMs to synthesize an eBPF program given a user's English language prompt. To ensure that LLM's output is semantically equivalent to the user's prompt, KEN employs a combination of LLM-empowered program comprehension, symbolic execution, and a series of feedback loops. KEN's key novelty is the combination of these techniques. In particular, the system uses symbolic execution in a novel structure that allows it to combine the results of program synthesis and program comprehension and build on the recent success that LLMs have shown for each of these tasks individually. To evaluate KEN, we developed a new corpus of natural language prompts for eBPF programs. We show that KEN produces correct eBPF programs on 80% which is an improvement of a factor of 2.67 compared to an LLM-empowered program synthesis baseline.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Unified pre-training for program understanding and generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2655–2668, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.211. URL https://aclanthology.org/2021.naacl-main.211.
  2. Search-based program synthesis. Commun. ACM, 61(12):84–93, nov 2018. ISSN 0001-0782. doi: 10.1145/3208071. URL https://doi.org/10.1145/3208071.
  3. Program synthesis with large language models, 2021.
  4. E. B. Authors. Eunomia bpf. GitHub repository, 2023. https://github.com/eunomia-bpf/eunomia-bpf.
  5. A flow-based IDS using machine learning in ebpf. CoRR, abs/2102.09980, 2021. URL https://arxiv.org/abs/2102.09980.
  6. Triangulating python performance issues with {{\{{SCALENE}}\}}. In 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23), pages 51–64, 2023.
  7. Extensibility safety and performance in the spin operating system. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, SOSP ’95, page 267–283, New York, NY, USA, 1995. Association for Computing Machinery. ISBN 0897917154. doi: 10.1145/224056.224077. URL https://doi.org/10.1145/224056.224077.
  8. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, volume 8, pages 209–224, 2008.
  9. A new era in software security: Towards self-healing software via large language models and formal verification. arXiv preprint arXiv:2305.14752, 2023.
  10. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021a.
  11. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021b.
  12. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128, 2023.
  13. Selective symbolic execution. In Proceedings of the 5th Workshop on Hot Topics in System Dependability (HotDep), number CONF, 2009.
  14. S2e: A platform for in-vivo multi-path analysis of software systems. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, page 265–278, New York, NY, USA, 2011. Association for Computing Machinery. ISBN 9781450302661. doi: 10.1145/1950365.1950396. URL https://doi.org/10.1145/1950365.1950396.
  15. Design and synthesis of synchronization skeletons using branching time temporal logic. In D. Kozen, editor, Logics of Programs, pages 52–71, Berlin, Heidelberg, 1982. Springer Berlin Heidelberg. ISBN 978-3-540-39047-3.
  16. Cloudflare. ebpf_exporter: ebpf-based exporter for prometheus. GitHub repository, 2023. https://github.com/cloudflare/ebpf_exporter.
  17. A. C. N. Community. Seven core issues about ebpf, 2023. https://www.alibabacloud.com/blog/seven-core-issues-about-ebpf_599668.
  18. L. De Moura and N. Bjørner. Z3: An efficient smt solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 337–340. Springer, 2008.
  19. eBPF for Windows Contributors. ebpf for windows, 2023. https://github.com/microsoft/ebpf-for-windows.
  20. Codetrans: Towards cracking the language of silicon’s code through self-supervised deep learning and high performance computing, 2021.
  21. Codebert: A pre-trained model for programming and natural languages. In Findings of EMNLP 2020, September 2020. URL https://www.microsoft.com/en-us/research/publication/codebert-a-pre-trained-model-for-programming-and-natural-languages/.
  22. fuzzing book author. The fuzzing book: Concolic fuzzing. https://www.fuzzingbook.org/beta/html/SymbolicFuzzer.html.
  23. {{\{{BMC}}\}}: Accelerating memcached using safe in-kernel caching and pre-stack processing. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21), pages 487–501, 2021.
  24. B. Gregg. Brenden gregg’s homepage, 2001. https://www.brendangregg.com/.
  25. B. Gregg. Linux extended bpf (ebpf) tracing tools, 2016. https://www.brendangregg.com/ebpf.html.
  26. B. Gregg. Bpf binaries: Btf, co-re, and the future of bpf perl tools, 2020. https://www.brendangregg.com/blog/2020-11-04/bpf-co-re-btf-libbpf.html.
  27. B. Gregg. Computing performance. 2021.
  28. Synthesis of loop-free programs. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’11, page 62–73, New York, NY, USA, 2011. Association for Computing Machinery. ISBN 9781450306638. doi: 10.1145/1993498.1993506. URL https://doi.org/10.1145/1993498.1993506.
  29. Seahorn: A framework for verifying c programs (competition contribution). In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 447–450. Springer, 2015.
  30. e. huangting4201, yingtongxiong. Internlm: Chat models tailored for practical scenarios and the training system. https://github.com/InternLM/InternLM/.
  31. ghost: Fast & flexible user-space delegation of linux scheduling. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, pages 588–604, 2021.
  32. R. Intellegence. Langchain. https://www.langchain.com/.
  33. Oracle-guided component-based program synthesis. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, ICSE ’10, page 215–224, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781605587196. doi: 10.1145/1806799.1806833. URL https://doi.org/10.1145/1806799.1806833.
  34. Practical and flexible kernel cfi enforcement using ebpf. In Proceedings of the 1st Workshop on EBPF and Kernel Extensions, eBPF ’23, page 84–85, New York, NY, USA, 2023a. Association for Computing Machinery. ISBN 9798400702938. doi: 10.1145/3609021.3609293. URL https://doi.org/10.1145/3609021.3609293.
  35. Practical and flexible kernel cfi enforcement using ebpf. In Proceedings of the 1st Workshop on eBPF and Kernel Extensions, pages 84–85, 2023b.
  36. f. In Proceedings of the 19th Workshop on Hot Topics in Operating Systems, pages 150–157, 2023c.
  37. Programmable system call security with ebpf. arXiv preprint arXiv:2302.10366, 2023d.
  38. Competition-level code generation with AlphaCode. Science, 378(6624):1092–1097, dec 2022. doi: 10.1126/science.abq1158. URL https://doi.org/10.1126%2Fscience.abq1158.
  39. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. arXiv preprint arXiv:2305.01210, 2023.
  40. Wizardcoder: Empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568, 2023.
  41. Performance monitoring with h2: Hybrid kernel/ebpf data plane for srv6 based hybrid sdn. Computer Networks, 185:107705, 2021. ISSN 1389-1286. doi: https://doi.org/10.1016/j.comnet.2020.107705. URL https://www.sciencedirect.com/science/article/pii/S1389128620313037.
  42. G. C. Necula and P. Lee. Safe kernel extensions without run-time checking. In Proceedings of the Second USENIX Symposium on Operating Systems Design and Implementation, OSDI ’96, page 229–243, New York, NY, USA, 1996. Association for Computing Machinery. ISBN 1880446820. doi: 10.1145/238721.238781. URL https://doi.org/10.1145/238721.238781.
  43. OpenAI. Instructgpt: Aligning language models to follow instructions. https://openai.com/research/instruction-following#ref-ASamir%20Rajadnya.
  44. Efficient incremental algorithms for dynamic detection of likely invariants. In Proceedings of the 12th ACM SIGSOFT Twelfth International Symposium on Foundations of Software Engineering, SIGSOFT ’04/FSE-12, page 23–32, New York, NY, USA, 2004. Association for Computing Machinery. ISBN 1581138555. doi: 10.1145/1029894.1029901. URL https://doi.org/10.1145/1029894.1029901.
  45. Cotext: Multi-task learning with code-text transformer. arXiv preprint arXiv:2105.08645, 2021.
  46. S. Poeplau and A. Francillon. Symbolic execution with {{\{{SymCC}}\}}: Don’t interpret, compile! In 29th USENIX Security Symposium (USENIX Security 20), pages 181–198, 2020.
  47. Certified reasoning with language models. arXiv preprint arXiv:2306.04031, 2023.
  48. I. V. Project. Bpf compiler collection (bcc), 2023. Available: https://github.com/iovisor/bcc.
  49. J. P. Queille and J. Sifakis. Specification and verification of concurrent systems in cesar. In M. Dezani-Ciancaglini and U. Montanari, editors, International Symposium on Programming, pages 337–351, Berlin, Heidelberg, 1982. Springer Berlin Heidelberg. ISBN 978-3-540-39184-5.
  50. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023.
  51. A. Shah. Google tpu v5e ai chip debuts after controversial origins. https://www.enterpriseai.news/2023/08/31/google-tpu-v5e-ai-chip-debuts-after-controversial-origins/.
  52. Combinatorial sketching for finite programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XII, page 404–415, New York, NY, USA, 2006. Association for Computing Machinery. ISBN 1595934510. doi: 10.1145/1168857.1168907. URL https://doi.org/10.1145/1168857.1168907.
  53. Healer: Relation learning guided kernel fuzzing. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, pages 344–358, 2021.
  54. Improving the reliability of commodity operating systems. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP ’03, page 207–222, New York, NY, USA, 2003. Association for Computing Machinery. ISBN 1581137575. doi: 10.1145/945445.945466. URL https://doi.org/10.1145/945445.945466.
  55. O. team. Internlm: Chat models tailored for practical scenarios and the training system. https://chat.openai.com/.
  56. T. G. P. Team. The source code of google pixel6 android kernel, a. https://android.googlesource.com/device/google/raviole-kernel/.
  57. T. L. Team. The ebpf verifier, b. https://static.lwn.net/kerneldoc/bpf/verifier.html.
  58. E. Torlak and R. Bodik. Growing solver-aided languages with rosette. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward! 2013, page 135–152, New York, NY, USA, 2013. Association for Computing Machinery. ISBN 9781450324724. doi: 10.1145/2509578.2509586. URL https://doi.org/10.1145/2509578.2509586.
  59. I. Visor. bpftrace: High-level tracing language for linux ebpf. GitHub repository, 2023. https://github.com/iovisor/bpftrace.
  60. Milvus: A purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data, pages 2614–2627, 2021.
  61. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 53(3):1–34, 2020.
  62. Wikipedia. The wikipedia of harmonyos. https://en.wikipedia.org/wiki/HarmonyOS.
  63. Code summarization with structure-induced transformer. arXiv preprint arXiv:2012.14710, 2020.
  64. Synthesizing safe and efficient kernel extensions for packet processing. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference, SIGCOMM ’21, page 50–64, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383837. doi: 10.1145/3452296.3472929. URL https://doi.org/10.1145/3452296.3472929.
  65. {{\{{λ𝜆\lambdaitalic_λ-IO}}\}}: A unified {{\{{IO}}\}} stack for computational storage. In 21st USENIX Conference on File and Storage Technologies (FAST 23), pages 347–362, 2023.
  66. XRP: In-Kernel storage functions with eBPF. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pages 375–393, Carlsbad, CA, July 2022. USENIX Association. ISBN 978-1-939133-28-1. URL https://www.usenix.org/conference/osdi22/presentation/zhong.
  67. Language-agnostic representation learning of source code from structure and context. arXiv preprint arXiv:2103.11318, 2021.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.