Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language Models (2402.13291v2)

Published 19 Feb 2024 in cs.CR, cs.LG, cs.PL, and cs.SE

Abstract: The automated program repair field has attracted substantial interest over the years, but despite significant research efforts, creating a system that works well for complex semantic bugs such as security vulnerabilities has proven difficult. A promising direction to solve this challenge is by leveraging LLMs, which are increasingly used to solve various programming tasks. In this paper, we investigate the effectiveness of LLMs for solving code-repair task. We show that the task is difficult as it requires the model to learn long-range code relationships, a task that inherently relies on extensive amounts of training data. At the same time, creating a large, clean dataset for complex program bugs and their corresponding fixes is non-trivial. We propose a technique to address these challenges with a new approach for querying and fine-tuning LLMs. The idea is to use program analysis to limit the LLM's attention mechanism on the portions of code needed to perform the fix, drastically reducing the amount of required training data. Concretely, for training and inference, rather than feeding the entire program to the LLM, we reduce its code to a much shorter snippet that contains the reported defect together with the necessary context - and use that instead. Our evaluation shows that this code reduction approach substantially improves available models such as GPT-4 using few-shot learning, as well as fine-tuning models. To train and evaluate our system, we created a comprehensive code fixing dataset by extensively labeling 156 bug patterns (including 40 security rules), requiring complex interprocedural dataflow to discover. Our best system with Mixtral-8x7B can remove more than 80% of the reported defects while exactly matching the human fix in between 10 and 50% of cases, outperforming baselines based on GPT-3.5 and GPT-4, or based on window-based models like TFix.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Learning to Represent Programs with Graphs. In ICLR 2018.
  2. Self-Supervised Bug Detection and Repair. In NeurIPS 2021, virtual. 27865–27876.
  3. Getafix: learning to fix bugs automatically. Proc. ACM Program. Lang. 3, OOPSLA (2019), 159:1–159:27.
  4. TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer. In ICML 2021, virtual (Proceedings of Machine Learning Research, Vol. 139). PMLR, 780–791.
  5. A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Commun. ACM 53, 2 (feb 2010), 66–75.
  6. Evaluating Large Language Models Trained on Code.
  7. Sequencer: Sequence-to-sequence learning for end-to-end program repair. IEEE Transactions on Software Engineering (2019).
  8. QLoRA: Efficient Finetuning of Quantized LLMs. CoRR abs/2305.14314 (2023). https://doi.org/10.48550/ARXIV.2305.14314 arXiv:2305.14314
  9. Semantic Code Repair using Neuro-Symbolic Transformation Networks. Workshop track invitation, ICML 2018. https://openreview.net/forum?id=r1hsJCe0Z
  10. Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs. In ICLR. https://openreview.net/forum?id=SJeqs6EFvB
  11. ESLint. 2022. ESLint rules. https://eslint.org/docs/latest/rules/
  12. Sorald: Automatic Patch Suggestions for SonarQube Static Analysis Violations. CoRR abs/2103.12033 (2021). arXiv:2103.12033 https://arxiv.org/abs/2103.12033
  13. Vision Transformer-Inspired Automated Vulnerability Repair. ACM Trans. Softw. Eng. Methodol. (nov 2023). https://doi.org/10.1145/3632746 Just Accepted.
  14. Automatic Software Repair: A Survey. IEEE Trans. Software Eng. 45, 1 (2019), 34–67.
  15. Alex Graves. 2012. Sequence Transduction with Recurrent Neural Networks. In ICML 2012, Workshop on Representation Learning.
  16. DeepFix: Fixing Common C Language Errors by Deep Learning. AAAI Conference on Artificial Intelligence 31, 1 (Feb. 2017). https://ojs.aaai.org/index.php/AAAI/article/view/10742
  17. On Distribution Shift in Learning-based Bug Detectors. In ICML 2022 (Proceedings of Machine Learning Research, Vol. 162). PMLR, 8559–8580.
  18. Global Relational Models of Source Code. In ICLR 2020. https://openreview.net/forum?id=B1lnbRNtwr
  19. The Curious Case of Neural Text Degeneration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=rygGQyrFvH
  20. Towards Practical Program Repair with On-Demand Candidate Generation. In ICSE 2018. ACM, 12–23.
  21. Mistral 7B. CoRR abs/2310.06825 (2023). https://doi.org/10.48550/ARXIV.2310.06825 arXiv:2310.06825
  22. Mixtral of Experts. CoRR abs/2401.04088 (2024). https://doi.org/10.48550/ARXIV.2401.04088 arXiv:2401.04088
  23. BugBuilder: An Automated Approach to Building Bug Repository. IEEE Transactions on Software Engineering (2022), 1–1. https://doi.org/10.1109/TSE.2022.3177713
  24. Repair Is Nearly Generation: Multilingual Program Repair with LLMs. CoRR abs/2208.11640 (2022). https://doi.org/10.48550/arXiv.2208.11640 arXiv:2208.11640
  25. StarCoder: may the source be with you! CoRR abs/2305.06161 (2023). https://doi.org/10.48550/ARXIV.2305.06161 arXiv:2305.06161
  26. Automatic inference of code transforms for patch generation. In Foundations of Software Engineering, ESEC/FSE 2017. ACM, 727–739.
  27. ENCORE: Ensemble Learning using Convolution Neural Machine Translation for Automatic Program Repair. CoRR abs/1906.08691 (2019). http://arxiv.org/abs/1906.08691
  28. SapFix: automated end-to-end repair at scale. In ICSE 2019.
  29. Angelix: scalable multiline program patch synthesis via symbolic analysis. In ICSE 2016.
  30. Microsoft. 2023a. Introducing AI-powered application security testing with GitHub Advanced Security. https://github.blog/2023-11-08-ai-powered-appsec/
  31. Microsoft. 2023b. Learn how to work with the ChatGPT and GPT-4 models. https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions
  32. Ghassan Misherghi and Zhendong Su. 2006. HDD: hierarchical Delta Debugging. In ICSE 2006. ACM, 142–151.
  33. Martin Monperrus. 2018. The Living Review on Automated Program Repair. Technical Report HAL-01956501. https://hal.archives-ouvertes.fr/hal-01956501v2/file/repair-living-review.pdf
  34. SemFix: program repair via semantic analysis. In ICSE 2013.
  35. OpenAI. 2022. GPT-3.5 Model Registry. https://platform.openai.com/docs/model-index-for-researchers/models-referred-to-as-gpt-3-5
  36. OpenAI. 2023. GPT-4 Technical Report. CoRR abs/2303.08774 (2023). https://doi.org/10.48550/ARXIV.2303.08774 arXiv:2303.08774
  37. OWASP Foundation. 2010. Path Traversal vulnerability description. https://owasp.org/www-community/attacks/Path_Traversal
  38. Synchromesh: Reliable Code Generation from Pre-trained Language Models. In ICLR 2022.
  39. Michael Pradel and Koushik Sen. 2018. DeepBugs: a learning approach to name-based bug detection. Proc. ACM Program. Lang. 2, OOPSLA (2018), 147:1–147:25.
  40. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR 21 (2020), 140:1–140:67.
  41. ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. ArXiv. https://www.microsoft.com/en-us/research/publication/zero-memory-optimizations-toward-training-trillion-parameter-models/
  42. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. In SIGKDD 2020 (Virtual Event, CA, USA) (KDD ’20). ACM, New York, NY, USA, 3505–3506.
  43. Code completion with statistical language models. In PLDI 2014. ACM, 419–428.
  44. Test-case reduction for C compiler bugs. In PLDI 2012. ACM, 335–346.
  45. Lessons from Building Static Analysis Tools at Google. Commun. ACM 61, 4 (mar 2018), 58–66.
  46. SemGrep. 2023a. Autofix. https://semgrep.dev/docs/writing-rules/autofix/
  47. SemGrep. 2023b. We put GPT-4 in Semgrep to point out false positives. https://semgrep.dev/blog/2023/gpt4-and-semgrep-detailed/
  48. Is the cure worse than the disease? overfitting in automated program repair. In FSE 2015. ACM, 532–543.
  49. Snyk. 2021. SAST tools speed comparison: Snyk Code vs SonarQube and LGTM. https://snyk.io/blog/sast-tools-speed-comparison-snyk-code-sonarqube-lgtm/
  50. Snyk. 2023a. Fix code vulnerabilities automatically. https://docs.snyk.io/scan-using-snyk/snyk-code/exploring-and-working-with-snyk-code-results-in-the-web-ui/fix-code-issues-automatically-with-deepcode-ai-fix-suggestions/
  51. Snyk. 2023b. Snyk Code - Developer-focused, real-time SAST. https://snyk.io/product/snyk-code/
  52. Neural Program Repair by Jointly Learning to Localize and Repair. In ICLR 2019.
  53. Veracode. 2023. Veracode Fix. https://www.veracode.com/fix
  54. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In EMNLP 2021. ACL, 8696–8708.
  55. Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs. IEEE Trans. Software Eng. 43 (2017), 34–55.
  56. A comprehensive study of automatic program repair on the QuixBugs benchmark. J. Syst. Softw. 171 (2021).
  57. Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure-Inducing Input. IEEE Trans. Software Eng. 28, 2 (2002), 183–200. https://doi.org/10.1109/32.988498
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Berkay Berabi (1 paper)
  2. Alexey Gronskiy (2 papers)
  3. Veselin Raychev (7 papers)
  4. Gishor Sivanrupan (1 paper)
  5. Victor Chibotaru (2 papers)
  6. Martin Vechev (103 papers)
Citations (6)