A User-centered Security Evaluation of Copilot (2308.06587v4)
Abstract: Code generation tools driven by artificial intelligence have recently become more popular due to advancements in deep learning and natural language processing that have increased their capabilities. The proliferation of these tools may be a double-edged sword because while they can increase developer productivity by making it easier to write code, research has shown that they can also generate insecure code. In this paper, we perform a user-centered evaluation GitHub's Copilot to better understand its strengths and weaknesses with respect to code security. We conduct a user study where participants solve programming problems (with and without Copilot assistance) that have potentially vulnerable solutions. The main goal of the user study is to determine how the use of Copilot affects participants' security performance. In our set of participants (n=25), we find that access to Copilot accompanies a more secure solution when tackling harder problems. For the easier problem, we observe no effect of Copilot access on the security of solutions. We also observe no disproportionate impact of Copilot use on particular kinds of vulnerabilities. Our results indicate that there are potential security benefits to using Copilot, but more research is warranted on the effects of the use of code generation tools on technically complex problems with security requirements.
- 2023. 2023 CWE Top 25 Most Dangerous Software Weaknesses. https://cwe.mitre.org/top25/archive/2023/2023_top25_list.html
- 2023. The MITRE Corporation. https://mitre.org
- Grounded Copilot: How Programmers Interact with Code-Generating Models. http://arxiv.org/abs/2206.15000 arXiv:2206.15000 [cs].
- A Neural Probabilistic Language Model. In Advances in Neural Information Processing Systems, Vol. 13. MIT Press. https://proceedings.neurips.cc/paper/2000/hash/728f206c2a01bf572b5940d7d9a8fa4c-Abstract.html
- Language Models are Few-Shot Learners. http://arxiv.org/abs/2005.14165 arXiv:2005.14165 [cs].
- Evaluating Large Language Models Trained on Code. arXiv:2107.03374 [cs] (July 2021). http://arxiv.org/abs/2107.03374 arXiv: 2107.03374.
- To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set? http://arxiv.org/abs/2204.06894 arXiv:2204.06894 [cs].
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May 2019). http://arxiv.org/abs/1810.04805 arXiv: 1810.04805.
- Thomas Dohmke. 2022. GitHub Copilot is generally available to all developers. https://github.blog/2022-06-21-github-copilot-is-generally-available-to-all-developers/
- GitHub Inc. 2019a. CodeQL. https://codeql.github.com/
- GitHub Inc. 2019b. CodeQL Repository. https://github.com/github/codeql
- GitHub Inc. 2021. GitHub Copilot · Your AI pair programmer. https://github.com/features/copilot
- Competition-Level Code Generation with AlphaCode. https://doi.org/10.48550/ARXIV.2203.07814
- Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. In 2022 IEEE Symposium on Security and Privacy (SP). 754–768. https://doi.org/10.1109/SP46214.2022.9833571 ISSN: 2375-1207.
- Do Users Write More Insecure Code with AI Assistants? arXiv preprint arXiv:2211.03622 (2022). https://arxiv.org/abs/2211.03622
- Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. https://www.usenix.org/system/files/sec23fall-prepub-353-sandoval.pdf
- Mohammed Latif Siddiq and Joanna C. S. Santos. 2022. SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques. In Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security (MSR4P&S 2022). Association for Computing Machinery, New York, NY, USA, 29–33. https://doi.org/10.1145/3549035.3561184 event-place: Singapore, Singapore.
- Choose your programming copilot: a comparison of the program synthesis performance of github copilot and genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Boston Massachusetts, 1019–1027. https://doi.org/10.1145/3512290.3528700
- IntelliCode compose: code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, Virtual Event USA, 1433–1443. https://doi.org/10.1145/3368089.3417058
- Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. ACM, New Orleans LA USA, 1–7. https://doi.org/10.1145/3491101.3519665
- Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010. event-place: Long Beach, California, USA.
- A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. ACM, San Diego CA USA, 1–10. https://doi.org/10.1145/3520312.3534862
- Weixiang Yan and Yuanchun Li. 2022. WhyGen: Explaining ML-powered Code Generation by Referring to Training Examples. http://arxiv.org/abs/2204.07940 arXiv:2204.07940 [cs].
- Productivity assessment of neural code completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. ACM, San Diego CA USA, 21–29. https://doi.org/10.1145/3520312.3534864
- Owura Asare (2 papers)
- Meiyappan Nagappan (25 papers)
- N. Asokan (78 papers)