Enabling Memory Safety of C Programs using LLMs (2404.01096v1)
Abstract: Memory safety violations in low-level code, written in languages like C, continues to remain one of the major sources of software vulnerabilities. One method of removing such violations by construction is to port C code to a safe C dialect. Such dialects rely on programmer-supplied annotations to guarantee safety with minimal runtime overhead. This porting, however, is a manual process that imposes significant burden on the programmer and, hence, there has been limited adoption of this technique. The task of porting not only requires inferring annotations, but may also need refactoring/rewriting of the code to make it amenable to such annotations. In this paper, we use LLMs towards addressing both these concerns. We show how to harness LLM capabilities to do complex code reasoning as well as rewriting of large codebases. We also present a novel framework for whole-program transformations that leverages lightweight static analysis to break the transformation into smaller steps that can be carried out effectively by an LLM. We implement our ideas in a tool called MSA that targets the CheckedC dialect. We evaluate MSA on several micro-benchmarks, as well as real-world code ranging up to 20K lines of code. We showcase superior performance compared to a vanilla LLM baseline, as well as demonstrate improvement over a state-of-the-art symbolic (non-LLM) technique.
- Efficient detection of all pointer and array access errors (PLDI ’94).
- Efficient Detection of All Pointer and Array Access Errors. In Proceedings of the ACM SIGPLAN’94 Conference on Programming Language Design and Implementation (PLDI), Orlando, Florida, USA, June 20-24, 1994, Vivek Sarkar, Barbara G. Ryder, and Mary Lou Soffa (Eds.).
- CodePlan: Repository-level Coding using LLMs and Planning. CoRR abs/2309.12499 (2023). https://doi.org/10.48550/ARXIV.2309.12499 arXiv:2309.12499
- tree-sitter/tree-sitter: v0.22.2. https://doi.org/10.5281/zenodo.10827268
- Run-time type checking for binary programs. In Proceedings of the 12th International Conference on Compiler Construction.
- Ranking LLM-Generated Loop Invariants for Program Verification. In The 2023 Conference on Empirical Methods in Natural Language Processing. https://openreview.net/forum?id=R7f5euZ9RA
- Checked C. 2024. Checked C Clang Compiler Repository. https://github.com/checkedc/checkedc-llvm-project.
- CheckedC Wiki. 2023. https://github.com/checkedc/checkedc/wiki/C-Conversion-Tips.
- Dependent types for low-level programming. In Proceedings of the 16th European Symposium on Programming (Braga, Portugal) (ESOP’07). Springer-Verlag, Berlin, Heidelberg, 520–535.
- Fixing Rust Compilation Errors using LLMs. CoRR abs/2308.05177 (2023). https://doi.org/10.48550/ARXIV.2308.05177 arXiv:2308.05177
- Refactoring the FreeBSD Kernel with Checked C. In IEEE Secure Development, SecDev 2020, Atlanta, GA, USA, September 28-30, 2020. IEEE, 15–22. https://doi.org/10.1109/SECDEV45635.2020.00018
- Checked C: Making C Safe by Extension. In 2018 IEEE Cybersecurity Development (SecDev). 53–60. https://doi.org/10.1109/SecDev.2018.00015
- Large Language Models for Software Engineering: Survey and Open Problems. In 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE). IEEE Computer Society, 31–53. https://doi.org/10.1109/ICSE-FoSE59343.2023.00008
- A theory of type qualifiers. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (Atlanta, Georgia, USA) (PLDI ’99). Association for Computing Machinery, New York, NY, USA, 192–203. https://doi.org/10.1145/301618.301665
- Reed Hastings and Bob Joyce. 1992. Purify: Fast Detection of Memory Leaks and Access Errors. In Proceedings of the Summer 1992 USENIX Conference. 125–138.
- Large Language Models for Software Engineering: A Systematic Literature Review. arXiv:2308.10620 (2023).
- Cyclone: A Safe Dialect of C. In Proceedings of the General Track of the Annual Conference on USENIX Annual Technical Conference (ATEC ’02). USENIX Association, USA, 275–288.
- Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach. In Proceedings of the ACM on Programming Languages (PACMPL), Issue OOPSLA.
- Competition-level code generation with AlphaCode. Science 378, 6624 (2022), 1092–1097. https://doi.org/10.1126/science.abq1158 arXiv:https://www.science.org/doi/pdf/10.1126/science.abq1158
- RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=pPjZIOuQuF
- C to checked C by 3c. Proc. ACM Program. Lang. 6, OOPSLA1, Article 78 (apr 2022), 29 pages. https://doi.org/10.1145/3527322
- SoftBound: highly compatible and complete spatial memory safety for c. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2009, Dublin, Ireland, June 15-21, 2009, Michael Hind and Amer Diwan (Eds.). ACM, 245–258. https://doi.org/10.1145/1542476.1542504
- Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). https://doi.org/10.1109/ICSE48619.2023.00205
- CCured: type-safe retrofitting of legacy code. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Portland, Oregon) (POPL ’02). Association for Computing Machinery, New York, NY, USA, 128–139. https://doi.org/10.1145/503272.503286
- Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (San Diego, California, USA) (PLDI ’07). Association for Computing Machinery, New York, NY, USA, 89–100. https://doi.org/10.1145/1250734.1250746
- CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In ICLR.
- Examining Zero-Shot Vulnerability Repair with Large Language Models. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 2339–2356. https://doi.org/10.1109/SP46215.2023.10179420
- Synchromesh: Reliable Code Generation from Pre-trained Language Models. In International Conference on Learning Representations. https://openreview.net/forum?id=KmtVD97J43e
- Can OpenAI’s codex fix bugs? an evaluation on QuixBugs. In Proceedings of the Third International Workshop on Automated Program Repair (Pittsburgh, Pennsylvania) (APR ’22). Association for Computing Machinery, New York, NY, USA, 69–75. https://doi.org/10.1145/3524459.3527351
- Supporting dynamic data structures on distributed-memory machines. ACM Trans. Program. Lang. Syst. (1995).
- Achieving Safety Incrementally with Checked C. In Principles of Security and Trust - 8th International Conference, POST 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings (Lecture Notes in Computer Science, Vol. 11426), Flemming Nielson and David Sands (Eds.). Springer, 76–98. https://doi.org/10.1007/978-3-030-17138-4_4
- Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 332, 7 pages. https://doi.org/10.1145/3491101.3519665
- Lemur: Integrating Large Language Models in Automated Program Verification. In The 3rd Workshop on Mathematical Reasoning and AI at NeurIPS’23. https://openreview.net/forum?id=NxHl2SPhyT
- Automated Program Repair in the Era of Large Pre-trained Language Models. In ICSE. https://doi.org/10.1109/ICSE48619.2023.00129
- A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (San Diego, CA, USA) (MAPS 2022). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3520312.3534862
- Measuring GitHub Copilot’s Impact on Productivity. Commun. ACM 67, 3 (feb 2024), 54–63. https://doi.org/10.1145/3633453