DeepCode AI Fix: Fixing Security Vulnerabilities with Large Language Models (2402.13291v2)
Abstract: The automated program repair field has attracted substantial interest over the years, but despite significant research efforts, creating a system that works well for complex semantic bugs such as security vulnerabilities has proven difficult. A promising direction to solve this challenge is by leveraging LLMs, which are increasingly used to solve various programming tasks. In this paper, we investigate the effectiveness of LLMs for solving code-repair task. We show that the task is difficult as it requires the model to learn long-range code relationships, a task that inherently relies on extensive amounts of training data. At the same time, creating a large, clean dataset for complex program bugs and their corresponding fixes is non-trivial. We propose a technique to address these challenges with a new approach for querying and fine-tuning LLMs. The idea is to use program analysis to limit the LLM's attention mechanism on the portions of code needed to perform the fix, drastically reducing the amount of required training data. Concretely, for training and inference, rather than feeding the entire program to the LLM, we reduce its code to a much shorter snippet that contains the reported defect together with the necessary context - and use that instead. Our evaluation shows that this code reduction approach substantially improves available models such as GPT-4 using few-shot learning, as well as fine-tuning models. To train and evaluate our system, we created a comprehensive code fixing dataset by extensively labeling 156 bug patterns (including 40 security rules), requiring complex interprocedural dataflow to discover. Our best system with Mixtral-8x7B can remove more than 80% of the reported defects while exactly matching the human fix in between 10 and 50% of cases, outperforming baselines based on GPT-3.5 and GPT-4, or based on window-based models like TFix.
- Learning to Represent Programs with Graphs. In ICLR 2018.
- Self-Supervised Bug Detection and Repair. In NeurIPS 2021, virtual. 27865–27876.
- Getafix: learning to fix bugs automatically. Proc. ACM Program. Lang. 3, OOPSLA (2019), 159:1–159:27.
- TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer. In ICML 2021, virtual (Proceedings of Machine Learning Research, Vol. 139). PMLR, 780–791.
- A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World. Commun. ACM 53, 2 (feb 2010), 66–75.
- Evaluating Large Language Models Trained on Code.
- Sequencer: Sequence-to-sequence learning for end-to-end program repair. IEEE Transactions on Software Engineering (2019).
- QLoRA: Efficient Finetuning of Quantized LLMs. CoRR abs/2305.14314 (2023). https://doi.org/10.48550/ARXIV.2305.14314 arXiv:2305.14314
- Semantic Code Repair using Neuro-Symbolic Transformation Networks. Workshop track invitation, ICML 2018. https://openreview.net/forum?id=r1hsJCe0Z
- Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs. In ICLR. https://openreview.net/forum?id=SJeqs6EFvB
- ESLint. 2022. ESLint rules. https://eslint.org/docs/latest/rules/
- Sorald: Automatic Patch Suggestions for SonarQube Static Analysis Violations. CoRR abs/2103.12033 (2021). arXiv:2103.12033 https://arxiv.org/abs/2103.12033
- Vision Transformer-Inspired Automated Vulnerability Repair. ACM Trans. Softw. Eng. Methodol. (nov 2023). https://doi.org/10.1145/3632746 Just Accepted.
- Automatic Software Repair: A Survey. IEEE Trans. Software Eng. 45, 1 (2019), 34–67.
- Alex Graves. 2012. Sequence Transduction with Recurrent Neural Networks. In ICML 2012, Workshop on Representation Learning.
- DeepFix: Fixing Common C Language Errors by Deep Learning. AAAI Conference on Artificial Intelligence 31, 1 (Feb. 2017). https://ojs.aaai.org/index.php/AAAI/article/view/10742
- On Distribution Shift in Learning-based Bug Detectors. In ICML 2022 (Proceedings of Machine Learning Research, Vol. 162). PMLR, 8559–8580.
- Global Relational Models of Source Code. In ICLR 2020. https://openreview.net/forum?id=B1lnbRNtwr
- The Curious Case of Neural Text Degeneration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=rygGQyrFvH
- Towards Practical Program Repair with On-Demand Candidate Generation. In ICSE 2018. ACM, 12–23.
- Mistral 7B. CoRR abs/2310.06825 (2023). https://doi.org/10.48550/ARXIV.2310.06825 arXiv:2310.06825
- Mixtral of Experts. CoRR abs/2401.04088 (2024). https://doi.org/10.48550/ARXIV.2401.04088 arXiv:2401.04088
- BugBuilder: An Automated Approach to Building Bug Repository. IEEE Transactions on Software Engineering (2022), 1–1. https://doi.org/10.1109/TSE.2022.3177713
- Repair Is Nearly Generation: Multilingual Program Repair with LLMs. CoRR abs/2208.11640 (2022). https://doi.org/10.48550/arXiv.2208.11640 arXiv:2208.11640
- StarCoder: may the source be with you! CoRR abs/2305.06161 (2023). https://doi.org/10.48550/ARXIV.2305.06161 arXiv:2305.06161
- Automatic inference of code transforms for patch generation. In Foundations of Software Engineering, ESEC/FSE 2017. ACM, 727–739.
- ENCORE: Ensemble Learning using Convolution Neural Machine Translation for Automatic Program Repair. CoRR abs/1906.08691 (2019). http://arxiv.org/abs/1906.08691
- SapFix: automated end-to-end repair at scale. In ICSE 2019.
- Angelix: scalable multiline program patch synthesis via symbolic analysis. In ICSE 2016.
- Microsoft. 2023a. Introducing AI-powered application security testing with GitHub Advanced Security. https://github.blog/2023-11-08-ai-powered-appsec/
- Microsoft. 2023b. Learn how to work with the ChatGPT and GPT-4 models. https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions
- Ghassan Misherghi and Zhendong Su. 2006. HDD: hierarchical Delta Debugging. In ICSE 2006. ACM, 142–151.
- Martin Monperrus. 2018. The Living Review on Automated Program Repair. Technical Report HAL-01956501. https://hal.archives-ouvertes.fr/hal-01956501v2/file/repair-living-review.pdf
- SemFix: program repair via semantic analysis. In ICSE 2013.
- OpenAI. 2022. GPT-3.5 Model Registry. https://platform.openai.com/docs/model-index-for-researchers/models-referred-to-as-gpt-3-5
- OpenAI. 2023. GPT-4 Technical Report. CoRR abs/2303.08774 (2023). https://doi.org/10.48550/ARXIV.2303.08774 arXiv:2303.08774
- OWASP Foundation. 2010. Path Traversal vulnerability description. https://owasp.org/www-community/attacks/Path_Traversal
- Synchromesh: Reliable Code Generation from Pre-trained Language Models. In ICLR 2022.
- Michael Pradel and Koushik Sen. 2018. DeepBugs: a learning approach to name-based bug detection. Proc. ACM Program. Lang. 2, OOPSLA (2018), 147:1–147:25.
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. JMLR 21 (2020), 140:1–140:67.
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. ArXiv. https://www.microsoft.com/en-us/research/publication/zero-memory-optimizations-toward-training-trillion-parameter-models/
- DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. In SIGKDD 2020 (Virtual Event, CA, USA) (KDD ’20). ACM, New York, NY, USA, 3505–3506.
- Code completion with statistical language models. In PLDI 2014. ACM, 419–428.
- Test-case reduction for C compiler bugs. In PLDI 2012. ACM, 335–346.
- Lessons from Building Static Analysis Tools at Google. Commun. ACM 61, 4 (mar 2018), 58–66.
- SemGrep. 2023a. Autofix. https://semgrep.dev/docs/writing-rules/autofix/
- SemGrep. 2023b. We put GPT-4 in Semgrep to point out false positives. https://semgrep.dev/blog/2023/gpt4-and-semgrep-detailed/
- Is the cure worse than the disease? overfitting in automated program repair. In FSE 2015. ACM, 532–543.
- Snyk. 2021. SAST tools speed comparison: Snyk Code vs SonarQube and LGTM. https://snyk.io/blog/sast-tools-speed-comparison-snyk-code-sonarqube-lgtm/
- Snyk. 2023a. Fix code vulnerabilities automatically. https://docs.snyk.io/scan-using-snyk/snyk-code/exploring-and-working-with-snyk-code-results-in-the-web-ui/fix-code-issues-automatically-with-deepcode-ai-fix-suggestions/
- Snyk. 2023b. Snyk Code - Developer-focused, real-time SAST. https://snyk.io/product/snyk-code/
- Neural Program Repair by Jointly Learning to Localize and Repair. In ICLR 2019.
- Veracode. 2023. Veracode Fix. https://www.veracode.com/fix
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In EMNLP 2021. ACL, 8696–8708.
- Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs. IEEE Trans. Software Eng. 43 (2017), 34–55.
- A comprehensive study of automatic program repair on the QuixBugs benchmark. J. Syst. Softw. 171 (2021).
- Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure-Inducing Input. IEEE Trans. Software Eng. 28, 2 (2002), 183–200. https://doi.org/10.1109/32.988498
- Berkay Berabi (1 paper)
- Alexey Gronskiy (2 papers)
- Veselin Raychev (7 papers)
- Gishor Sivanrupan (1 paper)
- Victor Chibotaru (2 papers)
- Martin Vechev (103 papers)