Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Do Users Write More Insecure Code with AI Assistants? (2211.03622v3)

Published 7 Nov 2022 in cs.CR

Abstract: We conduct the first large-scale user study examining how users interact with an AI Code assistant to solve a variety of security related tasks across different programming languages. Overall, we find that participants who had access to an AI assistant based on OpenAI's codex-davinci-002 model wrote significantly less secure code than those without access. Additionally, participants with access to an AI assistant were more likely to believe they wrote secure code than those without access to the AI assistant. Furthermore, we find that participants who trusted the AI less and engaged more with the language and format of their prompts (e.g. re-phrasing, adjusting temperature) provided code with fewer security vulnerabilities. Finally, in order to better inform the design of future AI-based Code assistants, we provide an in-depth analysis of participants' language and interaction behavior, as well as release our user interface as an instrument to conduct similar studies in the future.

Implications of AI Code Assistants on Code Security

The paper "Do Users Write More Insecure Code with AI Assistants?" by Neil Perry et al. presents a rigorous analysis of the security implications of using AI code assistants in software development. Leveraging underlying machine learning models such as OpenAI's Codex and Facebook's InCoder, AI code assistants have demonstrated potential benefits in improving productivity and lowering the barrier to entry for programming tasks. However, the inherent security risks associated with AI-generated code raise concerns about their deployment in practice.

Study Design and Methodology

To understand how developers interact with AI code assistants and the potential security implications, the authors conducted a comprehensive user paper. The paper involved 47 participants who completed five security-related programming tasks across three different programming languages (Python, JavaScript, and C). Participants were divided into two groups: a control group without access to an AI assistant and an experiment group with access.

The paper aimed to answer three principal research questions:

  1. Do users write more insecure code when given access to an AI programming assistant?
  2. Do users trust AI assistants to write secure code?
  3. How do users' language and behavior when interacting with an AI assistant affect the degree of security vulnerabilities in their code?

Key Findings

  1. Security of Code with AI Assistance: Participants with access to an AI assistant wrote insecure solutions more frequently than those in the control group for most tasks. For example, participants in the experiment group showed significantly higher rates of incorrect and insecure solutions in tasks involving cryptographic operations.
  2. Trust in AI Assistants: The paper observed that participants with access to AI assistants were more likely to believe they had written secure code even when it was not the case. This overconfidence stems from a misplaced trust in the AI's capabilities, leading to a false sense of security.
  3. Impact of Prompt Language and Parameters: The manner in which participants structured their prompts to the AI assistant significantly impacted the security of the code generated. Secure solutions were more common among participants who provided detailed prompts with helper functions and adjusted model parameters such as temperature.

Implications for Future Development

The findings emphasize the need for caution in deploying AI code assistants, particularly in security-sensitive applications. The potential for insecure code highlights several areas for improvement in AI model design and usage guidelines:

  • Refinement of AI Training Data: Ensuring training datasets contain secure and high-quality code is crucial. Incorporating security best practices and conducting static analysis on training data can mitigate the risks of propagating insecure code patterns.
  • User Education and Training: Developers need to be educated on the limitations of AI code assistants and the importance of verifying AI-generated code. Structured training programs can help developers better understand how to interact with these tools securely.
  • Integration of Security Features: Embedding security features and warnings within AI assistants and integrated development environments (IDEs) can guide developers in identifying potential vulnerabilities. Proactive measures such as automated security checks and prompts for safe coding practices can enhance code security.

Future Directions

Looking ahead, several promising research directions can build on the insights from this paper:

  • Adaptive AI Systems: Developing adaptive AI systems that learn from user interactions and refine their outputs to prioritize security can improve the reliability of code assistants. Reinforcement learning from human feedback, focusing on security, can be particularly effective.
  • Enhanced Prompt Engineering: Further exploration into optimal prompt engineering techniques can provide developers with best practices for interacting with AI assistants. Identifying guidelines for effective prompt structures that minimize security risks can be beneficial.
  • Collaborative Security Audits: Encouraging collaborative security audits involving AI-generated code can harness community expertise to identify and rectify vulnerabilities. Open repositories and shared databases of secure code prompts and outputs can support such initiatives.

Conclusion

The paper presents a significant contribution to understanding the interplay between AI code assistants and code security. While AI assistants offer notable productivity gains, their current state poses security risks that need addressing through improved training data, user education, and security-oriented AI system design. The insights from this paper provide a roadmap for future research and development efforts aimed at creating secure and reliable AI programming tools. Researchers and practitioners must collaborate to ensure that advancements in AI code assistants do not compromise the foundational aspect of software security.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Program synthesis with large language models. https://arxiv.org/abs/2108.07732, 2021.
  2. Grounded copilot: How programmers interact with code-generating models. https://arxiv.org/abs/2206.15000, 2022.
  3. Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 1995.
  4. D. Boneh and V. Shoup. 6.1 Definition of a message authentication code, pages 214–217. Version 0.5 edition, 2020.
  5. Evaluating large language models trained on code. https://arxiv.org/abs/2107.03374, 2021.
  6. J. Cohen. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 1960.
  7. One size does not fit all: A grounded theory and online survey study of developer preferences for security warning types. In IEEE/ACM 42nd International Conference on Software Engineering, 2020.
  8. F. Facebook. Facebook/infer: A static analyzer for java, c, c++, and objective-c. https://github.com/facebook/infer, 2022.
  9. The robots are coming: Exploring the implications of openai codex on introductory programming. In Australasian Computing Education Conference, 2022.
  10. Stack overflow considered harmful? the impact of copy & paste on android application security. In 2017 IEEE Symposium on Security and Privacy (SP), 2017.
  11. Incoder: A generative model for code infilling and synthesis. https://arxiv.org/abs/2204.05999, 2022.
  12. Discovering the syntax and strategies of natural language programming with generative language models. In ACM CHI Conference on Human Factors in Computing Systems, 2022.
  13. Crysl: An extensible approach to validating the correct usage of cryptographic apis. IEEE Transactions on Software Engineering, 2021.
  14. Neural query expansion for code search. In ACM sigplan international workshop on machine learning and programming languages, 2019.
  15. Codeexchange: Supporting reformulation of internet-scale code queries in context. ASE ’15, 2015.
  16. B. Pang and R. Kumar. Search in the lost sense of “query”: Question formulation in web search queries and its temporal changes. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011.
  17. Asleep at the keyboard? assessing the security of github copilot’s code contributions. In IEEE Symposium on Security and Privacy, 2022.
  18. Synchromesh: Reliable code generation from pre-trained language models. In International Conference on Learning Representations, 2022.
  19. T. Pornin. Deterministic Usage of the Digital Signature Algorithm (DSA) and Elliptic Curve Digital Signature Algorithm (ECDSA). RFC 6979, RFC Editor, August 2013.
  20. J. A. Prenner and R. Robbes. Automatic program repair with openai’s codex: Evaluating quixbugs. https://arxiv.org/abs/2111.03922, 2021.
  21. Security implications of large language model code assistants: A user study. https://arxiv.org/abs/2208.09727, 2022.
  22. What is it like to program with artificial intelligence? https://arxiv.org/abs/2208.06213, 2022.
  23. G. Schwarz. Estimating the Dimension of a Model. The Annals of Statistics, 1978.
  24. spotbugs. Spotbugs. https://spotbugs.github.io/, 2022.
  25. M. Tabachnyk and S. Nikolov. Ml-enhanced code completion improves developer productivity. https://ai.googleblog.com/2022/07/ml-enhanced-code-completion-improves.html, Jul 2022.
  26. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2022.
  27. Understanding security mistakes developers make: Qualitative analysis from build it, break it, fix it. In USENIX Security Symposium, 2020.
  28. In-ide code generation from natural language: Promise and challenges. https://arxiv.org/abs/2101.11149, 2021.
  29. Productivity assessment of neural code completion. https://arxiv.org/abs/2205.06537, 2022.
  30. Fine-tuning language models from human preferences, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Neil Perry (4 papers)
  2. Megha Srivastava (15 papers)
  3. Deepak Kumar (104 papers)
  4. Dan Boneh (43 papers)
Citations (128)
Youtube Logo Streamline Icon: https://streamlinehq.com