Enhancing Security of AI-Based Code Synthesis with GitHub Copilot via Cheap and Efficient Prompt-Engineering (2403.12671v1)
Abstract: AI assistants for coding are on the rise. However one of the reasons developers and companies avoid harnessing their full potential is the questionable security of the generated code. This paper first reviews the current state-of-the-art and identifies areas for improvement on this issue. Then, we propose a systematic approach based on prompt-altering methods to achieve better code security of (even proprietary black-box) AI-based code generators such as GitHub Copilot, while minimizing the complexity of the application from the user point-of-view, the computational resources, and operational costs. In sum, we propose and evaluate three prompt altering methods: (1) scenario-specific, (2) iterative, and (3) general clause, while we discuss their combination. Contrary to the audit of code security, the latter two of the proposed methods require no expert knowledge from the user. We assess the effectiveness of the proposed methods on the GitHub Copilot using the OpenVPN project in realistic scenarios, and we demonstrate that the proposed methods reduce the number of insecure generated code samples by up to 16\% and increase the number of secure code by up to 8\%. Since our approach does not require access to the internals of the AI models, it can be in general applied to any AI-based code synthesizer, not only GitHub Copilot.
- Open AI. 2022. Introducing ChatGPT. Open AI. https://openai.com/blog/chatgpt
- Multi-lingual Evaluation of Code Generation Models. arXiv:2210.14868 [cs.LG]
- Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]
- Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (¡conf-loc¿, ¡city¿Toronto ON¡/city¿, ¡country¿Canada¡/country¿, ¡/conf-loc¿) (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 1136–1142. https://doi.org/10.1145/3545945.3569823
- L. Giray. 2023. Prompt Engineering with ChatGPT: A Guide for Academic Writers. Annals of Biomedical Engineering 51 (2023). https://doi.org/10.1007/s10439-023-03272-4
- GitHub. 2024. GitHub Copilot. https://github.com/features/copilot.
- Jingxuan He and Martin Vechev. 2023. Large Language Models for Code: Security Hardening and Adversarial Testing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23). ACM. https://doi.org/10.1145/3576915.3623175
- CAMEL: Communicative Agents for ”Mind” Exploration of Large Language Model Society. arXiv:2303.17760 [cs.AI]
- AlphaCode 2 Technical Report. https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf
- Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, Online, 4582–4597. https://doi.org/10.18653/v1/2021.acl-long.353
- A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges. arXiv:2303.17125 [cs.SE]
- Data-efficient Fine-tuning for LLM-based Recommendation. arXiv preprint arXiv:2401.17197 (2024).
- On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot. arXiv:2302.00438 [cs.SE]
- OpenAI. 2023. Prompt Engineering. https://platform.openai.com/docs/guides/prompt-engineering.
- OWASP. 2024a. Fuzzing. https://owasp.org/www-community/Fuzzing
- OWASP. 2024b. Source Code Analysis Tools. https://owasp.org/www-community/Source_Code_Analysis_Tools
- Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. arXiv:2108.09293 [cs.CR]
- Do Users Write More Insecure Code with AI Assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23). ACM. https://doi.org/10.1145/3576915.3623157
- Communicative Agents for Software Development. arXiv:2307.07924 [cs.SE]
- CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv e-prints (2020), arXiv–2009.
- Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. arXiv:2208.09727 [cs.CR]
- An Empirical Study of Code Smells in Transformer-based Code Generation Techniques. 71–82. https://doi.org/10.1109/SCAM55253.2022.00014
- Mohammed Latif Siddiq and Joanna C. S. Santos. 2022. SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. In Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security (Singapore, Singapore) (MSR4P&S 2022). Association for Computing Machinery, New York, NY, USA, 29–33. https://doi.org/10.1145/3549035.3561184
- Snyk. 2024. Snyk secures AI‑generated code. Snyk. https://snyk.io/solutions/secure-ai-generated-code/
- CWE Content Team. 2023a. CWE-476: NULL Pointer Dereference. https://cwe.mitre.org/data/definitions/476.html.
- CWE Content Team. 2023b. CWE VIEW: Research Concepts. https://cwe.mitre.org/data/definitions/1000.html
- LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). 588–592. https://doi.org/10.1109/MSR59073.2023.00084
- Improving LLM Code Generation with Grammar Augmentation. arXiv:2403.01632 [cs.LG]
- Small language models improve giants by rewriting their outputs. arXiv preprint arXiv:2305.13514 (2023).
- Factcheck-GPT: End-to-End Fine-Grained Document-Level Fact-Checking and Correction of LLM Output. arXiv:2311.09000 [cs.CL]
- A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv:2302.11382 [cs.SE]
- Assessing the quality of GitHub copilot’s code generation. In Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering (Singapore, Singapore) (PROMISE 2022). Association for Computing Machinery, New York, NY, USA, 62–71. https://doi.org/10.1145/3558489.3559072
- Balancing specialized and general skills in llms: The impact of modern tuning and data strategy. arXiv preprint arXiv:2310.04945 (2023).
- Jakub Res (1 paper)
- Ivan Homoliak (28 papers)
- Martin Perešíni (9 papers)
- Aleš Smrčka (2 papers)
- Kamil Malinka (9 papers)
- Petr Hanacek (3 papers)