Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NAVRepair: Node-type Aware C/C++ Code Vulnerability Repair (2405.04994v1)

Published 8 May 2024 in cs.SE

Abstract: The rapid advancement of deep learning has led to the development of LLMs. In the field of vulnerability repair, previous research has leveraged rule-based fixing, pre-trained models, and LLM's prompt engineering. However, existing approaches have limitations in terms of the integration of code structure with error types. Besides, due to certain features of C/C++ language, vulnerability repair in C/C++ proves to be exceptionally challenging. To address these challenges, we propose NAVRepair, a novel framework that combines the node-type information extracted from Abstract Syntax Trees (ASTs) with error types, specifically targeting C/C++ vulnerabilities. Specifically, our approach employs type analysis to localize the minimum edit node (MEN) and customizes context information collection based on different error types. In the offline stage, NAVRepair parses code patches to locate MENs and designs rules to extract relevant contextual information for each MEN type. In the online repairing stage, it analyzes the suspicious code, combines it with vulnerability type templates derived from the Common Weakness Enumeration (CWE), and generates targeted repair prompts. We evaluate NAVRepair on multiple popular LLMs and demonstrate its effectiveness in improving the performance of code vulnerability repair. Notably, our framework is independent of any specific LLMs and can quickly adapt to new vulnerability types. Extensive experiments validate that NAVRepair achieves excellent results in assisting LLMs to accurately detect and fix C/C++ vulnerabilities. We achieve a 26% higher accuracy compared to an existing LLM-based C/C++ vulnerability repair method. We believe our node type-aware approach has promising application prospects for enhancing real-world C/C++ code security.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. [n. d.]. https://cwe.mitre.org/top25/archive/2023/2023_top25_list.html.
  2. [n. d.]. https://openai.com/blog/chatgpt.
  3. [n. d.]. https://deepmind.google/technologies/gemini/.
  4. [n. d.]. https://tree-sitter.github.io/tree-sitter/.
  5. A CNN-based automatic vulnerability detection. 1 (2023).
  6. CVEfixes: Automated Collection of Vulnerabilities and Their Fixes from Open-Source Software. ACM, 10.
  7. Evaluating large language models trained on code. (2021).
  8. Neural Transfer Learning for Repairing Security Vulnerabilities in C Code. IEEE Transactions on Software Engineering (2022).
  9. LEOPARD: Identifying Vulnerable Code for Vulnerability Assessment through Program Metrics. arXiv:1901.11479 [cs.SE]
  10. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv:2002.08155 [cs.CL]
  11. Learning to Quantize Vulnerability Patterns and Match to Locate Statement-Level Vulnerabilities. arXiv:2306.06109 [cs.CR]
  12. VQR: Automated Software Vulnerability Repair Through Vulnerability Queries.
  13. CRaDLe: Deep code retrieval based on semantic dependency learning. Neural Networks 141 (2021), 385–394.
  14. UniXcoder: Unified Cross-Modal Pre-training for Code Representation. arXiv:2203.03850 [cs.CL]
  15. DeepSeek-Coder: When the Large Language Model Meets Programming–The Rise of Code Intelligence. (2024).
  16. An Empirical Study on Fine-Tuning Large Language Models of Code for Automated Program Repair. In 2023 38th IEEE/ACM International Conference on ASE.
  17. Control Flow Graph Embedding Based on Multi-Instance Decomposition for Bug Localization. Proceedings of the AAAI Conference on Artificial Intelligence 34, 04 (Apr. 2020), 4223–4230.
  18. Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach. (2023).
  19. CURE: Code-Aware Neural Machine Translation for Automatic Program Repair. In 2021 IEEE/ACM 43rd ICSE. IEEE.
  20. Exploring and enforcing security guarantees via program dependence graphs. SIGPLAN Not. 50, 6 (jun 2015), 291–302.
  21. Repair Is Nearly Generation: Multilingual Program Repair with LLMs. arXiv:2208.11640 [cs.SE]
  22. Starcoder: may the source be with you! (2023).
  23. Evaluating C/C++ Vulnerability Detectability of Query-Based Static Application Security Testing Tools. IEEE Transactions on Dependable and Secure Computing (2024).
  24. Unleashing the power of compiler intermediate representation to enhance neural program embeddings. In Proceedings of the 44th ICSE. 2253–2265.
  25. VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal Models. (2023).
  26. CCTest: Testing and Repairing Code Completion Systems. In Proceedings of the 45th ICSE. 1238–1250.
  27. On the feasibility of specialized ability stealing for large language code models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering.
  28. Split and Merge: Aligning Position Biases in Large Language Model based Evaluators. (2023).
  29. Protecting Intellectual Property of Large Language Model-Based Code Generation APIs via Watermarks. In Proceedings of the 2023 ACM SIGSAC Conference on CCS 2023. ACM.
  30. On the efficiency of test suite based program repair: A systematic assessment of 16 automated repair systems for java programs. In Proceedings of the ACM/IEEE 42nd ICSE. 615–627.
  31. Exploring Missed Optimizations in WebAssembly Optimizers. In Proceedings of the 32nd ACM SIGSOFT ISSTA. 436–448.
  32. ” Oops, Did I Just Say That?” Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process. (2023).
  33. Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning. In 2023 IEEE/ACM 45th ICSE (ICSE). 2450–2462.
  34. Distinguishing Look-Alike Innocent and Vulnerable Code by Subtle Semantic Representation Learning and Explanation. arXiv:2308.11237 [cs.SE]
  35. Expanding Fix Patterns to Enable Automatic Program Repair. In 2021 IEEE 32nd ISSRE.
  36. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.
  37. Examining Zero-Shot Vulnerability Repair with Large Language Models. arXiv:2112.02125 [cs.CR]
  38. Static inference meets deep learning: a hybrid type inference approach for Python. In Proceedings of the 44th ICSE.
  39. Codebleu: a method for automatic evaluation of code synthesis. (2020).
  40. Code llama: Open foundation models for code. (2023).
  41. Static memory leak detection using full-sparse value-flow analysis. In Proceedings of the 2012 ISSTA (Minneapolis, MN, USA) (ISSTA 2012). Association for Computing Machinery, New York, NY, USA, 254–264.
  42. REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes. (2023).
  43. InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models. (2023).
  44. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. (2023).
  45. Magicoder: Source Code Is All You Need. (2023).
  46. Vulnerability Detection with Graph Simplification and Enhanced Graph Representation Learning. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023. IEEE, 2275–2286.
  47. When Less is Enough: Positive and Unlabeled Learning Model for Vulnerability Detection. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 345–357.
  48. BinAug: Enhancing Binary Similarity Analysis with Low-Cost Input Repairing. In 2024 IEEE/ACM 46th ICSE (ICSE).
  49. How Effective Are Neural Networks for Fixing Security Vulnerabilities. In Proceedings of the 32nd ACM SIGSOFT ISSTA (ISSTA ’23). ACM.
  50. Chunqiu Steven Xia and Lingming Zhang. 2022. Less Training, More Repairing Please: Revisiting Automated Program Repair via Zero-shot Learning. arXiv:2207.08281 [cs.SE]
  51. Chunqiu Steven Xia and Lingming Zhang. 2023. Conversational Automated Program Repair. arXiv:2301.13246 [cs.SE]
  52. Precise Condition Synthesis for Program Repair. In 2017 IEEE/ACM 39th ICSE (ICSE). 416–426.
  53. An extensive study on pre-trained models for program understanding and generation. In Proceedings of the 31st ACM SIGSOFT ISSTA 2022. Association for Computing Machinery.
  54. Vulnerability Detection by Learning From Syntax-Based Execution Paths of Code. IEEE Transactions on Software Engineering 49, 08 (aug 2023), 4196–4212.
  55. Detecting condition-related bugs with control flow graph neural network. In Proceedings of the 32nd ACM SIGSOFT ISSTA. 1370–1382.
  56. Context and Multi-Features-Based Vulnerability Detection: A Vulnerability Detection Frame Based on Context Slicing and Multi-Features. Sensors 24, 5 (2024).
  57. CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X. arXiv:2303.17568 [cs.LG]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ruoke Wang (2 papers)
  2. Zongjie Li (29 papers)
  3. Chaozheng Wang (28 papers)
  4. Yang Xiao (149 papers)
  5. Cuiyun Gao (97 papers)
Citations (3)