Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Security of AI-Based Code Synthesis with GitHub Copilot via Cheap and Efficient Prompt-Engineering (2403.12671v1)

Published 19 Mar 2024 in cs.CR and cs.AI

Abstract: AI assistants for coding are on the rise. However one of the reasons developers and companies avoid harnessing their full potential is the questionable security of the generated code. This paper first reviews the current state-of-the-art and identifies areas for improvement on this issue. Then, we propose a systematic approach based on prompt-altering methods to achieve better code security of (even proprietary black-box) AI-based code generators such as GitHub Copilot, while minimizing the complexity of the application from the user point-of-view, the computational resources, and operational costs. In sum, we propose and evaluate three prompt altering methods: (1) scenario-specific, (2) iterative, and (3) general clause, while we discuss their combination. Contrary to the audit of code security, the latter two of the proposed methods require no expert knowledge from the user. We assess the effectiveness of the proposed methods on the GitHub Copilot using the OpenVPN project in realistic scenarios, and we demonstrate that the proposed methods reduce the number of insecure generated code samples by up to 16\% and increase the number of secure code by up to 8\%. Since our approach does not require access to the internals of the AI models, it can be in general applied to any AI-based code synthesizer, not only GitHub Copilot.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Open AI. 2022. Introducing ChatGPT. Open AI. https://openai.com/blog/chatgpt
  2. Multi-lingual Evaluation of Code Generation Models. arXiv:2210.14868 [cs.LG]
  3. Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs.LG]
  4. Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (¡conf-loc¿, ¡city¿Toronto ON¡/city¿, ¡country¿Canada¡/country¿, ¡/conf-loc¿) (SIGCSE 2023). Association for Computing Machinery, New York, NY, USA, 1136–1142. https://doi.org/10.1145/3545945.3569823
  5. L. Giray. 2023. Prompt Engineering with ChatGPT: A Guide for Academic Writers. Annals of Biomedical Engineering 51 (2023). https://doi.org/10.1007/s10439-023-03272-4
  6. GitHub. 2024. GitHub Copilot. https://github.com/features/copilot.
  7. Jingxuan He and Martin Vechev. 2023. Large Language Models for Code: Security Hardening and Adversarial Testing. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23). ACM. https://doi.org/10.1145/3576915.3623175
  8. CAMEL: Communicative Agents for ”Mind” Exploration of Large Language Model Society. arXiv:2303.17760 [cs.AI]
  9. AlphaCode 2 Technical Report. https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf
  10. Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, Online, 4582–4597. https://doi.org/10.18653/v1/2021.acl-long.353
  11. A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges. arXiv:2303.17125 [cs.SE]
  12. Data-efficient Fine-tuning for LLM-based Recommendation. arXiv preprint arXiv:2401.17197 (2024).
  13. On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot. arXiv:2302.00438 [cs.SE]
  14. OpenAI. 2023. Prompt Engineering. https://platform.openai.com/docs/guides/prompt-engineering.
  15. OWASP. 2024a. Fuzzing. https://owasp.org/www-community/Fuzzing
  16. OWASP. 2024b. Source Code Analysis Tools. https://owasp.org/www-community/Source_Code_Analysis_Tools
  17. Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. arXiv:2108.09293 [cs.CR]
  18. Do Users Write More Insecure Code with AI Assistants?. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23). ACM. https://doi.org/10.1145/3576915.3623157
  19. Communicative Agents for Software Development. arXiv:2307.07924 [cs.SE]
  20. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv e-prints (2020), arXiv–2009.
  21. Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. arXiv:2208.09727 [cs.CR]
  22. An Empirical Study of Code Smells in Transformer-based Code Generation Techniques. 71–82. https://doi.org/10.1109/SCAM55253.2022.00014
  23. Mohammed Latif Siddiq and Joanna C. S. Santos. 2022. SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. In Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security (Singapore, Singapore) (MSR4P&S 2022). Association for Computing Machinery, New York, NY, USA, 29–33. https://doi.org/10.1145/3549035.3561184
  24. Snyk. 2024. Snyk secures AI‑generated code. Snyk. https://snyk.io/solutions/secure-ai-generated-code/
  25. CWE Content Team. 2023a. CWE-476: NULL Pointer Dereference. https://cwe.mitre.org/data/definitions/476.html.
  26. CWE Content Team. 2023b. CWE VIEW: Research Concepts. https://cwe.mitre.org/data/definitions/1000.html
  27. LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). 588–592. https://doi.org/10.1109/MSR59073.2023.00084
  28. Improving LLM Code Generation with Grammar Augmentation. arXiv:2403.01632 [cs.LG]
  29. Small language models improve giants by rewriting their outputs. arXiv preprint arXiv:2305.13514 (2023).
  30. Factcheck-GPT: End-to-End Fine-Grained Document-Level Fact-Checking and Correction of LLM Output. arXiv:2311.09000 [cs.CL]
  31. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv:2302.11382 [cs.SE]
  32. Assessing the quality of GitHub copilot’s code generation. In Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering (Singapore, Singapore) (PROMISE 2022). Association for Computing Machinery, New York, NY, USA, 62–71. https://doi.org/10.1145/3558489.3559072
  33. Balancing specialized and general skills in llms: The impact of modern tuning and data strategy. arXiv preprint arXiv:2310.04945 (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jakub Res (1 paper)
  2. Ivan Homoliak (28 papers)
  3. Martin Perešíni (9 papers)
  4. Aleš Smrčka (2 papers)
  5. Kamil Malinka (9 papers)
  6. Petr Hanacek (3 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.