Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code (2403.07506v1)
Abstract: LLMs for code (LLM4Code), which demonstrate strong performance (e.g., high accuracy) in processing source code, have significantly transformed software engineering. Many studies separately investigate the non-functional properties of LM4Code, but there is no systematic review of how these properties are evaluated and enhanced. This paper fills this gap by thoroughly examining 146 relevant studies, thereby presenting the first systematic literature review to identify seven important properties beyond accuracy, including robustness, security, privacy, explainability, efficiency, and usability. We discuss the current state-of-the-art methods and trends, identify gaps in existing research, and present promising directions for future study.
- [n. d.]. A Massively Spiffy Yet Delicately Unobtrusive Compression Library. https://zlib.net/. Accessed on March 27, 2023.
- [n. d.]. OWASP Benchmark Project. https://owasp.org/www-project-benchmark/. Accessed: 2023-12-19.
- SHIELD: Thwarting Code Authorship Attribution.
- TrojanPuzzle: Covertly Poisoning Code-Suggestion Models. arXiv preprint arXiv:2301.02344 (2023).
- A Transformer-based Approach for Source Code Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4998–5007. https://doi.org/10.18653/v1/2020.acl-main.449
- Traces of Memorisation in Large Language Models for Code. arXiv:2312.11658 [cs.CR]
- Reem Aleithan. 2021. Explainable Just-in-Time Bug Prediction: Are We There Yet?. In Proceedings of the 43rd International Conference on Software Engineering: Companion Proceedings (Virtual Event, Spain) (ICSE ’21). IEEE Press, 129–131. https://doi.org/10.1109/ICSE-Companion52605.2021.00056
- SantaCoder: don’t reach for the stars! arXiv:2301.03988 [cs.SE]
- Miltiadis Allamanis. 2019. The Adverse Effects of Code Duplication in Machine Learning Models of Code. In Proceedings of the 2019 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Athens, Greece) (Onward! 2019). Association for Computing Machinery, New York, NY, USA, 143–153. https://doi.org/10.1145/3359591.3359735
- code2seq: Generating Sequences from Structured Representations of Code. In International Conference on Learning Representations. https://openreview.net/forum?id=H1gKYo09tX
- Code2vec: Learning Distributed Representations of Code. Proc. ACM Program. Lang. 3, POPL, Article 40 (Jan. 2019), 29 pages. https://doi.org/10.1145/3290353
- Source code authorship attribution using long short-term memory based networks. In Computer Security - ESORICS 2017. Springer Verlag, 65–82. https://doi.org/10.1007/978-3-319-66402-6_6
- Adversarial Robustness of Program Synthesis Models. In Advances in Programming Languages and Neurosymbolic Systems Workshop. https://openreview.net/forum?id=17C-dfA5X69
- Assessing Robustness of ML-Based Program Analysis Tools using Metamorphic Program Transformations. In ASE 2021. 1377–1381. https://doi.org/10.1109/ASE51524.2021.9678706
- Program Synthesis with Large Language Models. arXiv:2108.07732 [cs.PL]
- Gareth Ari Aye and Gail E. Kaiser. 2020. Sequence Model Design for Code Completion in the Modern IDE. arXiv:2004.05249 [cs.SE]
- Shamil Ayupov and Nadezhda Chirkova. 2022. Parameter-Efficient Finetuning of Transformers for Source Code. arXiv:2212.05901 [cs.CL]
- SecretBench: A Dataset of Software Secrets. In Proceedings of the 20th International Conference on Mining Software Repositories (MSR ’23). 5 pages.
- EW-Tune: A Framework for Privately Fine-Tuning Large Language Models with Differential Privacy. In 2022 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE. https://doi.org/10.1109/icdmw58026.2022.00078
- Pavol Bielik and Martin Vechev. 2020. Adversarial robustness for code. In International Conference on Machine Learning. PMLR, 896–907.
- An Integrative Human-Centered Architecture for Interactive Programming Assistants. In 2022 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 1–5. https://doi.org/10.1109/VL/HCC53370.2022.9833110
- De-Anonymizing Programmers via Code Stylometry. In Proceedings of the 24th USENIX Conference on Security Symposium (Washington, D.C.) (SEC’15). USENIX Association, USA, 255–270.
- Extracting Training Data from Large Language Models. In USENIX Security Symposium.
- Nicholas Carlini and David Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, Los Alamitos, CA, USA, 39–57. https://doi.org/10.1109/SP.2017.49
- NatGen: generative pre-training by “naturalizing” source code. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (¡conf-loc¿, ¡city¿Singapore¡/city¿, ¡country¿Singapore¡/country¿, ¡/conf-loc¿) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 18–30. https://doi.org/10.1145/3540250.3549162
- Deep Learning Based Vulnerability Detection: Are We There Yet? IEEE Transactions on Software Engineering 48, 09 (sep 2022), 3280–3296. https://doi.org/10.1109/TSE.2021.3087402
- Evaluating the robustness of source code plagiarism detection tools to pervasive plagiarism-hiding modifications. Empirical Software Engineering 26, 5 (2021), 83.
- Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering. CoRR abs/1811.03728 (2018). arXiv:1811.03728 http://arxiv.org/abs/1811.03728
- Stealing Deep Reinforcement Learning Models for Fun and Profit. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security (Virtual Event, Hong Kong) (ASIA CCS ’21). Association for Computing Machinery, New York, NY, USA, 307–319. https://doi.org/10.1145/3433210.3453090
- Evaluating Large Language Models Trained on Code. CoRR (2021).
- Generating Adversarial Source Programs Using Important Tokens-based Structural Transformations. In 2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS). 173–182. https://doi.org/10.1109/ICECCS54210.2022.00029
- Fairness testing: A comprehensive survey and analysis of trends. (2022).
- TABS: Efficient Textual Adversarial Attack for Pre-trained NL Code Model Using Semantic Beam Search. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 5490–5498. https://doi.org/10.18653/v1/2022.emnlp-main.369
- To What Extent Do Deep Learning-Based Code Recommenders Generate Predictions by Cloning Code from the Training Set?. In Proceedings of the 19th International Conference on Mining Software Repositories (Pittsburgh, Pennsylvania) (MSR ’22). Association for Computing Machinery, New York, NY, USA, 167–178. https://doi.org/10.1145/3524842.3528440
- Counterfactual Explanations for Models of Code. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (Pittsburgh, Pennsylvania) (ICSE-SEIP ’22). Association for Computing Machinery, New York, NY, USA, 125–134. https://doi.org/10.1145/3510457.3513081
- Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks. arXiv:2308.04451 [cs.CR]
- A Game-Based Framework to Compare Program Classifiers and Evaders. In Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization (Montréal, QC, Canada) (CGO 2023). Association for Computing Machinery, New York, NY, USA, 108–121. https://doi.org/10.1145/3579990.3580012
- QLoRA: Efficient Finetuning of Quantized LLMs.
- Haibiao Ding and Mansur H. Samadzadeh. 2004. Extraction of Java program fingerprints for software authorship identification. Journal of Systems and Software 72, 1 (2004), 49–57. https://doi.org/10.1016/S0164-1212(03)00049-9
- Stefania Druga and Amy J. Ko. 2023. AI Friends: A Design Framework for AI-Powered Creative Programming for Youth. arXiv:2305.10412 [cs.HC]
- Stefania Druga and Nancy Otero. 2023. Scratch Copilot Evaluation: Assessing AI-Assisted Creative Coding for Families. arXiv:2305.10417 [cs.HC]
- Understanding Promotion-as-a-Service on GitHub. In Proceedings of the 36th Annual Computer Security Applications Conference (ACSAC ’20). Association for Computing Machinery, New York, NY, USA, 597–610. https://doi.org/10.1145/3427228.3427258
- An Extensive Study on Adversarial Attack against Pre-trained Models of Code. (2023). arXiv:2311.07553 [cs.CR]
- Large Language Models for Software Engineering: Survey and Open Problems. arXiv:2310.03533 [cs.SE]
- Automated Detection of Password Leakage from Public GitHub Repositories. In 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). 175–186. https://doi.org/10.1145/3510003.3510150
- CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, 1536–1547.
- Claudio Ferretti and Martina Saletta. 2021. Deceiving neural source code classifiers: finding adversarial examples with grammatical evolution. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (Lille, France) (GECCO ’21). Association for Computing Machinery, New York, NY, USA, 1889–1897. https://doi.org/10.1145/3449726.3463222
- Improving Text-to-SQL Evaluation Methodology. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, Melbourne, Australia, 351–360. https://doi.org/10.18653/v1/P18-1033
- The robots are coming: Exploring the implications of openai codex on introductory programming. In Proceedings of the 24th Australasian Computing Education Conference. 10–19.
- Tira Nur Fitria. 2021. QuillBot as an online tool: Students’ alternative in paraphrasing and rewriting of English writing. Englisia: Journal of Language, Education, and Humanities 9, 1 (2021), 183–196.
- Source Code Author Identification Based on N-gram Author Profiles. In Artificial Intelligence Applications and Innovations, Ilias Maglogiannis, Kostas Karpouzis, and Max Bramer (Eds.). Springer US, Boston, MA, 508–515.
- InCoder: A Generative Model for Code Infilling and Synthesis. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=hQwb-lbM6EL
- Hunting for Truth: Analyzing Explanation Methods in Learning-based Vulnerability Discovery. In 2023 IEEE 8th European Symposium on Security and Privacy (EuroS&P). 524–541. https://doi.org/10.1109/EuroSP57164.2023.00038
- Discrete Adversarial Attack to Models of Code. Proc. ACM Program. Lang. 7, PLDI, Article 113 (jun 2023), 24 pages. https://doi.org/10.1145/3591227
- Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations. http://arxiv.org/abs/1412.6572
- Compressing bert: Studying the effects of weight pruning on transfer learning. arXiv preprint arXiv:2002.08307 (2020).
- The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA, 895–907. https://doi.org/10.1145/3611643.3616304
- GraphCodeBERT: Pre-training Code Representations with Data Flow. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021.
- CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models. arXiv:2302.04012 [cs.CR]
- Jingxuan He and Martin Vechev. 2023. Large Language Models for Code: Security Hardening and Adversarial Testing. arXiv:2302.05319 [cs.CR]
- Vincent J. Hellendoorn and Anand Ashok Sawant. 2021. The Growing Cost of Deep Learning for Source Code. Commun. ACM 65, 1 (dec 2021), 31–33. https://doi.org/10.1145/3501261
- Measuring Coding Challenge Competence With APPS. arXiv:2105.09938 [cs.SE]
- Semantic Robustness of Models of Source Code. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 526–537. https://doi.org/10.1109/SANER53432.2022.00070
- CigaR: Cost-efficient Program Repair with LLMs.
- Do Large Code Models Understand Programming Concepts? A Black-box Approach. arXiv:2402.05980 [cs.SE]
- Large Language Models for Software Engineering: A Systematic Literature Review. arXiv:2308.10620 [cs.SE]
- Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning. PMLR, 2790–2799.
- LoRA: Low-Rank Adaptation of Large Language Models. In International Conference on Learning Representations. https://openreview.net/forum?id=nZeVKeeFYf9
- Active Code Learning: Benchmarking Sample-Efficient Training of Code Models. arXiv:2306.01250 [cs.SE]
- Summarizing Source Code with Transferred API Knowledge. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (Stockholm, Sweden) (IJCAI’18). AAAI Press, 2269–2275.
- Do Not Give Away My Secrets: Uncovering the Privacy Issue of Neural Code Completion Tools. arXiv:2309.07639 [cs.CR]
- Where to Look When Repairing Code? Comparing the Attention of Neural Models and Developers. arXiv:2305.07287 [cs.SE]
- CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 (2019).
- Occlusion-based Detection of Trojan-triggering Inputs in Large Language Models of Code. arXiv:2312.04004 [cs.SE]
- A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques. arXiv:2305.03803 [cs.SE]
- TrojanedCM: A Repository of Trojaned Large Language Models of Code. arXiv:2311.14850 [cs.SE]
- Optimized Tokenization Process for Open-Vocabulary Code Completion: An Empirical Study. In Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering (Oulu, Finland) (EASE ’23). Association for Computing Machinery, New York, NY, USA, 398–405. https://doi.org/10.1145/3593434.3594236
- Saki Imai. 2022. Is GitHub Copilot a Substitute for Human Pair-Programming? An Empirical Study. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 319–321. https://doi.org/10.1145/3510454.3522684
- Enhancing Robustness of AI Offensive Code Generators via Data Augmentation. arXiv:2306.05079 [cs.LG]
- Samireh Jalali and Claes Wohlin. 2012. Systematic literature studies: Database searches vs. backward snowballing. In Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. 29–38. https://doi.org/10.1145/2372251.2372257
- Exploring the Learnability of Program Synthesizers by Novice Programmers. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 64, 15 pages. https://doi.org/10.1145/3526113.3545659
- Akshita Jha and Chandan K. Reddy. 2023. CodeAttack: code-based adversarial attacks for pre-trained programming language models. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence (AAAI’23/IAAI’23/EAAI’23). AAAI Press, Article 1670, 9 pages. https://doi.org/10.1609/aaai.v37i12.26739
- Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach. arXiv:2310.06680 [cs.SE]
- Unlearnable Examples: Protecting Open-Source Software from Unauthorized Neural Code Learning.. In SEKE. 525–530.
- ClawSAT: Towards Both Robust and Accurate Code Models. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society, Los Alamitos, CA, USA, 212–223. https://doi.org/10.1109/SANER56733.2023.00029
- An Empirical Study of Model-Agnostic Techniques for Defect Prediction Models. IEEE Transactions on Software Engineering 48, 1 (2022), 166–185. https://doi.org/10.1109/TSE.2020.2982385
- Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation. arXiv preprint arXiv:2402.11702 (2024).
- Connecting the .dotfiles: Checked-In Secret Exposure with Extra (Lateral Movement) Steps. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). 322–333. https://doi.org/10.1109/MSR59073.2023.00051
- LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression. arXiv:2309.14021 [cs.CL]
- Studying the Effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 455, 23 pages. https://doi.org/10.1145/3544548.3580919
- SEGRESS: Software Engineering Guidelines for REporting Secondary Studies. IEEE Transactions on Software Engineering 49, 03 (mar 2023), 1273–1298. https://doi.org/10.1109/TSE.2022.3174092
- A Probabilistic Approach to Source Code Authorship Identification. In Fourth International Conference on Information Technology (ITNG’07). 243–248. https://doi.org/10.1109/ITNG.2007.17
- Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation. arXiv:2306.01220 [cs.SE]
- Evaluating Program Repair with Semantic-Preserving Transformations: A Naturalness Assessment. arXiv:2402.11892 [cs.SE]
- Benjamin Ledel and Steffen Herbold. 2022. Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP. arXiv:2209.07623 [cs.SE]
- Who Wrote this Code? Watermarking for Code Generation. arXiv:2305.15060 [cs.CL]
- Resilient Watermarking for LLM-Generated Codes. arXiv:2402.07518 [cs.CR]
- TextBugger: Generating Adversarial Text Against Real-world Applications. In Proceedings 2019 Network and Distributed System Security Symposium. Internet Society. https://doi.org/10.14722/ndss.2019.23138
- Poison Attack and Defense on Deep Source Code Processing Models. https://doi.org/10.48550/ARXIV.2210.17029
- ”Always Nice and Confident, Sometimes wrong”: Developer’s Experiences Engaging Generative AI Chatbots Versus Human-Powered Q&A Platforms. arXiv:2309.13684 [cs.HC]
- StarCoder: may the source be with you! arXiv:2305.06161 [cs.CL]
- Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, Online, 4582–4597. https://doi.org/10.18653/v1/2021.acl-long.353
- Multi-target Backdoor Attacks for Code Pre-trained Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 7236–7254. https://doi.org/10.18653/v1/2023.acl-long.399
- A Closer Look into Transformer-Based Code Intelligence Through Code Transformation: Challenges and Opportunities. arXiv:2207.04285 [cs.SE]
- Semantic-Preserving Adversarial Code Comprehension. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 3017–3028. https://aclanthology.org/2022.coling-1.267
- Do Pretrained Language Models Indeed Understand Software Engineering Tasks? IEEE Transactions on Software Engineering 49, 10 (oct 2023), 4639–4655. https://doi.org/10.1109/TSE.2023.3308952
- RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 1906–1918. https://doi.org/10.1145/3510003.3510181
- A Comparative Study of Adversarial Training Methods for Neural Models of Source Code. Future Gener. Comput. Syst. 142, C (may 2023), 165–181. https://doi.org/10.1016/j.future.2022.12.030
- Robust Learning of Deep Predictive Models from Noisy and Imbalanced Software Engineering Datasets. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Article 86, 13 pages. https://doi.org/10.1145/3551349.3556941
- Protecting Intellectual Property of Large Language Model-Based Code Generation APIs via Watermarks. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (CCS ’23). Association for Computing Machinery, New York, NY, USA, 2336–2350. https://doi.org/10.1145/3576915.3623120
- Robin: A Novel Method to Produce Robust Interpreters for Deep Learning-Based Code Classifiers. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE Computer Society, Los Alamitos, CA, USA, 27–39. https://doi.org/10.1109/ASE56229.2023.00164
- SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. IEEE Transactions on Dependable and Secure Computing 19, 04 (jul 2022), 2244–2258. https://doi.org/10.1109/TDSC.2021.3051525
- A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges. In 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE). IEEE Computer Society, Los Alamitos, CA, USA, 605–617. https://doi.ieeecomputersociety.org/
- EVIL: Exploiting Software via Natural Language. In 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). IEEE Computer Society, Los Alamitos, CA, USA, 321–332. https://doi.org/10.1109/ISSRE52982.2021.00042
- Can NMT Understand Me? Towards Perturbation-Based Evaluation of NMT Models for Code Generation. In Proceedings of the 1st International Workshop on Natural Language-Based Software Engineering (Pittsburgh, Pennsylvania) (NLBSE ’22). Association for Computing Machinery, New York, NY, USA, 59–66. https://doi.org/10.1145/3528588.3528653
- Zachary C. Lipton. 2018. The Mythos of Model Interpretability: In Machine Learning, the Concept of Interpretability is Both Important and Slippery. Queue 16, 3 (jun 2018), 31–57. https://doi.org/10.1145/3236386.3241340
- An Empirical Study of Parameter-Efficient Fine-Tuning Methods for Pre-Trained Code Models. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE Computer Society, Los Alamitos, CA, USA, 397–408. https://doi.org/10.1109/ASE56229.2023.00125
- A Practical Black-Box Attack on Source Code Authorship Identification Classifiers. IEEE Transactions on Information Forensics and Security 16 (2021), 3620–3633. https://doi.org/10.1109/TIFS.2021.3080507
- Delving into Parameter-Efficient Fine-Tuning in Code Change Learning: An Empirical Study. arXiv:2402.06247 [cs.SE]
- On the Reliability and Explainability of Automated Code Generation Approaches. 1, 1 (2023), 1–20. arXiv:2302.09587 http://arxiv.org/abs/2302.09587
- On the Reliability and Explainability of Language Models for Program Generation. ACM Trans. Softw. Eng. Methodol. (jan 2024). https://doi.org/10.1145/3641540 Just Accepted.
- David Lo. 2023. Trustworthy and Synergistic Artificial Intelligence for Software Engineering: Vision and Roadmaps. arXiv:2309.04142 [cs.SE]
- LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning. In 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE). IEEE Computer Society, Los Alamitos, CA, USA, 647–658. https://doi.org/10.1109/ISSRE59848.2023.00026
- CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. CoRR (2021).
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. NeruIPS 30 (2017).
- The “code” of Ethics:A Holistic Audit of AI Code Generators.
- Are Code Pre-trained Models Powerful to Learn Code Syntax and Semantics? arXiv:2212.10017 [cs.SE]
- Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code. arXiv:2402.09299 [cs.SE]
- On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot. In Proceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE ’23). IEEE Press, 2149–2160. https://doi.org/10.1109/ICSE48619.2023.00181
- Adversarial Authorship Attribution in Open-Source Projects. In Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy (Richardson, Texas, USA) (CODASPY ’19). Association for Computing Machinery, New York, NY, USA, 291–302. https://doi.org/10.1145/3292006.3300032
- Evolutionary Approaches for Adversarial Attacks on Neural Source Code Classifiers. Algorithms 16, 10 (2023). https://doi.org/10.3390/a16100478
- Meta. [n. d.]. Code Llama. https://ai.meta.com/blog/code-llama-large-language-model-coding/
- Equation of state calculations by fast computing machines. The journal of chemical physics 21, 6 (1953), 1087–1092.
- A Systematic Literature Review of Explainable AI for Software Engineering. arXiv:2302.06065 [cs.SE]
- Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work? (2022). arXiv:2211.12821 http://arxiv.org/abs/2211.12821
- Explaining Transformer-based Code Models: What Do They Learn? When They Do Not Work?. In 2023 IEEE 23rd International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE Computer Society, Los Alamitos, CA, USA, 96–106. https://doi.org/10.1109/SCAM59687.2023.00020
- Convolutional Neural Networks over Tree Structures for Programming Language Processing. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (Phoenix, Arizona) (AAAI’16). AAAI Press, 1287–1293.
- When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming. arXiv:2306.04930 [cs.HC]
- DIP: Dead code Insertion based Black-box Attack for Programming Language Model. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 7777–7791. https://doi.org/10.18653/v1/2023.acl-long.430
- Stress Test Evaluation for Natural Language Inference. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, 2340–2353. https://aclanthology.org/C18-1198
- Adversarial Attacks to API Recommender Systems: Time to Wake Up and Smell the Coffee?. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 253–265. https://doi.org/10.1109/ASE51524.2021.9678946
- How Beginning Programmers and Code LLMs (Mis)read Each Other. arXiv:2401.15232 [cs.HC]
- Adversarial Attacks on Code Models with Discriminative Graph Patterns. arXiv:2308.11161 [cs.SE]
- Generative Artificial Intelligence for Software Engineering – A Research Agenda. arXiv:2310.18648 [cs.SE]
- CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations.
- An Empirical Comparison of Pre-Trained Models of Source Code. arXiv:2302.04026 [cs.SE]
- CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 2133–2150.
- Poisoned ChatGPT Finds Work for Idle Hands: Exploring Developers’ Coding Practices with Insecure Suggestions from Poisoned AI Models. arXiv:2312.06227 [cs.CR]
- Evaluating and Explaining Large Language Models for Code Using Syntactic Structures. arXiv:2308.03873 [cs.SE]
- Extracting Meaningful Attention on Source Code: An Empirical Study of Developer and Neural Model Code Exploration. arXiv:2210.05506 [cs.SE]
- Matteo Paltenghi and Michael Pradel. 2021. Thinking Like a Developer? Comparing the Attention of Humans with Neural Models of Code. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 867–879. https://doi.org/10.1109/ASE51524.2021.9678712
- Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016).
- Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions. In 43rd IEEE Symposium on Security and Privacy, SP 2022, San Francisco, CA, USA, May 22-26, 2022. IEEE, 754–768. https://doi.org/10.1109/SP46214.2022.9833571
- The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arXiv:2302.06590 [cs.SE]
- Illia Polosukhin and Alexander Skidanov. 2018. Neural Program Search: Solving Programming Tasks from Description and Examples. arXiv:1802.04335 [cs.AI]
- PyExplainer: Explaining the Predictions of Just-in-Time Defect Models. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (Melbourne, Australia) (ASE ’21). IEEE Press, 407–418. https://doi.org/10.1109/ASE51524.2021.9678763
- A Search-Based Testing Framework for Deep Neural Networks of Source Code Embedding. In 14th IEEE Conference on Software Testing, Verification and Validation, ICST 2021, Porto de Galinhas, Brazil, April 12-16, 2021. IEEE.
- “It’s Weird That It Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers. ACM Trans. Comput.-Hum. Interact. (aug 2023). https://doi.org/10.1145/3617367 Just Accepted.
- ONION: A Simple and Effective Defense Against Textual Backdoor Attacks. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 9558–9566. https://doi.org/10.18653/v1/2021.emnlp-main.752
- BadCS: A Backdoor Attack Framework for Code search.
- Misleading Authorship Attribution of Source Code Using Adversarial Learning. In Proceedings of the 28th USENIX Conference on Security Symposium (Santa Clara, CA, USA) (SEC’19). USENIX Association, USA, 479–496.
- Md Rafiqul Islam Rabin and Mohammad Amin Alipour. 2021. Evaluation of Generalizability of Neural Program Analyzers under Semantic-Preserving Transformations. arXiv:2004.07313 [cs.SE]
- Md Rafiqul Islam Rabin and Mohammad Amin Alipour. 2022. FeatureExtractor: A tool for extracting key input features of code intelligence models. Software Impacts 14 (2022), 100432. https://doi.org/10.1016/j.simpa.2022.100432
- On the generalizability of Neural Program Models with respect to semantic-preserving program transformations. Information and Software Technology 135 (2021), 106552. https://doi.org/10.1016/j.infsof.2021.106552
- On the generalizability of Neural Program Models with respect to semantic-preserving program transformations. Information and Software Technology 135 (2021), 106552.
- Understanding neural code intelligence through program simplification. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Athens, Greece) (ESEC/FSE 2021). Association for Computing Machinery, New York, NY, USA, 441–452. https://doi.org/10.1145/3468264.3468539
- Memorization and generalization in neural code intelligence models. Information and Software Technology 153 (2023), 107066. https://doi.org/10.1016/j.infsof.2022.107066
- Testing Neural Program Analyzers. arXiv:1908.10711 [cs.LG]
- Goutham Ramakrishnan and Aws Albarghouthi. 2022. Backdoors in Neural Models of Source Code. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE Computer Society, Los Alamitos, CA, USA, 2892–2899. https://doi.org/10.1109/ICPR56361.2022.9956690
- Probabilistic Model for Code with Decision Trees. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (Amsterdam, Netherlands) (OOPSLA 2016). Association for Computing Machinery, New York, NY, USA, 731–747. https://doi.org/10.1145/2983990.2984041
- Neural Network-Based Detection of Self-Admitted Technical Debt: From Performance to Explainability. ACM Trans. Softw. Eng. Methodol. 28, 3, Article 15 (jul 2019), 45 pages. https://doi.org/10.1145/3324916
- ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1135–1144. https://doi.org/10.1145/2939672.2939778
- Benchmarking Causal Study to Interpret Large Language Models for Source Code. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE Computer Society, Los Alamitos, CA, USA, 329–334. https://doi.org/10.1109/ICSME58846.2023.00040
- Why Don’t XAI Techniques Agree? Characterizing the Disagreements Between Post-hoc Explanations of Defect Predictions. In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE Computer Society, Los Alamitos, CA, USA, 444–448. https://doi.org/10.1109/ICSME55016.2022.00056
- Mootez Saad and Tushar Sharma. 2023. Naturalness of Attention: Revisiting Attention in Code Language Models. arXiv:2311.13508 [cs.SE]
- Utilization of Pre-trained Language Model for Adapter-based Knowledge Transfer in Software Engineering. arXiv:2307.08540 [cs.SE]
- Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 2205–2222. https://www.usenix.org/conference/usenixsecurity23/presentation/sandoval
- You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion. In 30th USENIX Security Symposium (USENIX Security 21). USENIX Association, 1559–1575.
- An Exploratory Study on Code Attention in BERT. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension (Virtual Event) (ICPC ’22). Association for Computing Machinery, New York, NY, USA, 437–448. https://doi.org/10.1145/3524610.3527921
- Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey. arXiv:2310.17903 [cs.SE]
- Structural-semantics Guided Program Simplification for Understanding Neural Code Intelligence Models. In Proceedings of the 14th Asia-Pacific Symposium on Internetware (Internetware ’23). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3609437.3609438
- Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2023). Association for Computing Machinery, New York, NY, USA, 39–51. https://doi.org/10.1145/3597926.3598036
- Smaller, Faster, Greener: Compressing Pre-trained Code Models via Surrogate-Assisted Optimization. arXiv preprint arXiv:2309.04076 (2023).
- Compressing Pre-Trained Models of Code into 3 MB (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 24, 12 pages. https://doi.org/10.1145/3551349.3556964
- Explainable Software Defect Prediction: Are We There Yet?
- Exploring the Robustness of Large Language Models for Solving Programming Problems. arXiv:2306.14583 [cs.CL]
- Paraphrasing Techniques for Maritime QA system. arXiv:2203.10854 [cs.CL]
- An Empirical Study of Code Smells in Transformer-based Code Generation Techniques. In 2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM). 71–82. https://doi.org/10.1109/SCAM55253.2022.00014
- Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution. Proc. Priv. Enhancing Technol. 2018, 1 (2018), 127–144.
- Leo Song and Steven H.H. Ding. 2023. Milo: Attacking Deep Pre-trained Model for Programming Languages Tasks with Anti-analysis Code Obfuscation. In COMPSAC. 586–594. https://doi.org/10.1109/COMPSAC57700.2023.00084
- STRATA: Simple, Gradient-Free Attacks for Models of Code.
- Generating Adversarial Computer Programs using Optimized Obfuscations. ICLR 16 (2021), 209–226.
- Mateusz Staniak and Przemyslaw Biecek. 2018. Explanations of model predictions with live and breakDown packages. (2018).
- Chia-Yi Su and Collin McMillan. 2024. Distilled GPT for source code summarization. Automated Software Engineering 31, 1 (2024), 22. https://doi.org/10.1007/s10515-024-00421-4
- Investigating Explainability of Generative AI for Code through Scenario-Based Design. In 27th International Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 212–228. https://doi.org/10.1145/3490099.3511119
- Backdooring Neural Code Search. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 9692–9708. https://doi.org/10.18653/v1/2023.acl-long.540
- CodeMark: Imperceptible Watermarking for Code Datasets against Neural Code Completion Models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). Association for Computing Machinery, New York, NY, USA, 1561–1572. https://doi.org/10.1145/3611643.3616297
- CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning. In Proceedings of the ACM Web Conference 2022 (Virtual Event, Lyon, France) (WWW ’22). Association for Computing Machinery, New York, NY, USA, 652–660. https://doi.org/10.1145/3485447.3512225
- When Neural Code Completion Models Size up the Situation: Attaining Cheaper and Faster Completion through Dynamic Model Inference. arXiv:2401.09964 [cs.SE]
- Don’t Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 324–325. https://doi.org/10.1109/ICSE-Companion58688.2023.00089
- Towards a Big Data Curated Benchmark of Inter-project Code Clones. In 2014 ICSME. 476–480. https://doi.org/10.1109/ICSME.2014.77
- Fast and memory-efficient neural code completion. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 329–340.
- Is ChatGPT the Ultimate Programming Assistant – How far is it? arXiv:2304.11938 [cs.SE]
- Generating Adversarial Examples of Source Code Classification Models via Q-Learning-Based Markov Decision Process. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). 807–818. https://doi.org/10.1109/QRS54544.2021.00090
- Code Difference Guided Adversarial Example Generation for Deep Code Models. , 850-862 pages.
- Spectral Signatures in Backdoor Attacks. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc.
- Sergey Troshin and Nadezhda Chirkova. 2022. Probing Pretrained Models of Source Codes. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Jasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, and Sarah Wiegreffe (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 371–383. https://doi.org/10.18653/v1/2022.blackboxnlp-1.31
- An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation. ACM Trans. Softw. Eng. Methodol. 28, 4, Article 19 (sep 2019), 29 pages. https://doi.org/10.1145/3340544
- Towards More Effective AI-Assisted Programming: A Systematic Design Exploration to Improve Visual Studio IntelliCode’s User Experience. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 185–195. https://doi.org/10.1109/ICSE-SEIP58684.2023.00022
- Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 332, 7 pages. https://doi.org/10.1145/3491101.3519665
- Generation Probabilities Are Not Enough: Exploring the Effectiveness of Uncertainty Highlighting in AI-Powered Code Completions. arXiv:2302.07248 [cs.HC]
- You See What I Want You to See: Poisoning Vulnerabilities in Neural Code Search. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Singapore, Singapore) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 1233–1245. https://doi.org/10.1145/3540250.3549153
- What Do They Capture? A Structural Analysis of Pre-Trained Language Models for Source Code. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 2377–2388. https://doi.org/10.1145/3510003.3510050
- Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). https://openreview.net/forum?id=GF9cSKI3A_q
- One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization. arXiv:2303.15822 [cs.SE]
- Investigating and Designing for Trust in AI-powered Code Generation Tools. arXiv:2305.11248 [cs.HC]
- ReCode: Robustness Evaluation of Code Generation Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 13818–13843. https://doi.org/10.18653/v1/2023.acl-long.773
- Detecting and Explaining Self-Admitted Technical Debts with Attention-Based Neural Networks. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering (Virtual Event, Australia) (ASE ’20). Association for Computing Machinery, New York, NY, USA, 871–882. https://doi.org/10.1145/3324884.3416583
- Robust learning against relational adversaries. Advances in Neural Information Processing Systems 35 (2022), 16246–16260.
- Yu Wang and Ke Wang. 2023. Demystifying What Code Summarization Models Learned. arXiv:2303.02333 [cs.PL]
- An Explanation Method for Models of Code. Proc. ACM Program. Lang. 7, OOPSLA2, Article 250 (oct 2023), 27 pages. https://doi.org/10.1145/3622826
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021.
- CoCoFuzzing: Testing Neural Code Models With Coverage-Guided Fuzzing. IEEE Transactions on Reliability (2022), 1–14. https://doi.org/10.1109/TR.2022.3208239
- Towards Greener Yet Powerful Code Generation via Quantization: An Empirical Study (ESEC/FSE 2023). 224–236. https://doi.org/10.1145/3611643.3616302
- Better Together? An Evaluation of AI-Supported Code Translation. In 27th International Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 369–391. https://doi.org/10.1145/3490099.3511157
- Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models. arXiv:2308.10462 [cs.SE]
- Claes Wohlin. 2014. Guidelines for Snowballing in Systematic Literature Studies and a Replication in Software Engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (London, England, United Kingdom) (EASE ’14). Association for Computing Machinery, New York, NY, USA, Article 38, 10 pages. https://doi.org/10.1145/2601248.2601268
- DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions. arXiv:2312.04730 [cs.CR]
- DevGPT: Studying Developer-ChatGPT Conversations. arXiv preprint arXiv:2309.03914 (2023).
- Towards Privacy Preserving Cross Project Defect Prediction with Federated Learning. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 485–496. https://doi.org/10.1109/SANER56733.2023.00052
- COCO: Testing Code Generation Systems via Concretized Instructions. arXiv:2308.13319 [cs.SE]
- How Important are Good Method Names in Neural Code Generation? A Model Robustness Perspective. https://doi.org/10.48550/ARXIV.2211.15844
- How Important Are Good Method Names in Neural Code Generation? A Model Robustness Perspective. ACM Trans. Softw. Eng. Methodol. (oct 2023). https://doi.org/10.1145/3630010 Just Accepted.
- Assessing and Improving Syntactic Adversarial Robustness of Pre-trained Models for Code Translation. arXiv:2310.18587 [cs.SE]
- Authorship attribution of source code by using back propagation neural network based on particle swarm optimization. PloS one 12, 11 (2017), e0187204.
- An Empirical Study of Model-Agnostic Interpretation Technique for Just-in-Time Software Defect Prediction. In Collaborative Computing: Networking, Applications and Worksharing, Honghao Gao and Xinheng Wang (Eds.). Springer International Publishing, Cham, 420–438.
- Natural Attack for Pre-Trained Models of Code. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 1482–1493. https://doi.org/10.1145/3510003.3510146
- Stealthy Backdoor Attack for Code Models. IEEE Transactions on Software Engineering 01 (feb [n. d.]), 1–21. https://doi.org/10.1109/TSE.2024.3361661
- Gotcha! This Model Uses My Code! Evaluating Membership Leakage Risks in Code Models. arXiv:2310.01166 [cs.SE]
- Unveiling Memorization in Code Models.
- Proceedings of the ACM on Programming Languages 4, OOPSLA (2020).
- AdVulCode: Generating Adversarial Vulnerable Code against Deep Learning-Based Vulnerability Detectors. Electronics 12, 4 (2023). https://doi.org/10.3390/electronics12040936
- An Extensive Study on Pre-Trained Models for Program Understanding and Generation. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual, South Korea) (ISSTA 2022). Association for Computing Machinery, New York, NY, USA, 39–51. https://doi.org/10.1145/3533767.3534390
- Practices and Challenges of Using GitHub Copilot: An Empirical Study. In International Conferences on Software Engineering and Knowledge Engineering. KSI Research Inc. https://doi.org/10.18293/seke2023-077
- Transfer Attacks and Defenses for Large Language Models on Coding Tasks. arXiv:2311.13445 [cs.LG]
- Towards robustness of deep program processing models—detection, estimation, and enhancement. TOSEM 31, 3 (2022), 1–40.
- Generating Adversarial Examples for Holding Robustness of Source Code Processing Models. Proceedings of the AAAI Conference on Artificial Intelligence 34, 01 (Apr. 2020), 1169–1176.
- CodeBERT-Attack: Adversarial attack against source code deep learning models via pre-trained model. Journal of Software: Evolution and Process ([n. d.]), e2571.
- RNNS: Representation Nearest Neighbor Search Black-Box Attack on Code Models. arXiv:2305.05896 [cs.CR]
- Android in the Zoo: Chain-of-Action-Thought for GUI Agents. arXiv:2403.02713 [cs.CL]
- Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning. PMLR, 11328–11339.
- Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering 48, 1 (2022), 1–36. https://doi.org/10.1109/TSE.2019.2962027
- What does Transformer learn about source code? arXiv:2207.08466 [cs.SE]
- Sheng Zhang and Hui Li. 2023. Code Membership Inference for Detecting Unauthorized Data Use in Code Pre-trained Language Models.
- Interpretable Program Synthesis. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 105, 16 pages. https://doi.org/10.1145/3411764.3445646
- Challenging Machine Learning-Based Clone Detectors via Semantic-Preserving Code Transformations. TSE 49, 5 (2023), 3052–3070. https://doi.org/10.1109/TSE.2023.3240118
- Diet Code is Healthy: Simplifying Programs for Pre-Trained Models of Code. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Singapore, Singapore) (ESEC/FSE 2022). Association for Computing Machinery, New York, NY, USA, 1073–1084. https://doi.org/10.1145/3540250.3549094
- Interpretability application of the Just-in-Time software defect prediction model. Journal of Systems and Software 188 (2022), 111245. https://doi.org/10.1016/j.jss.2022.111245
- A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends. arXiv:2311.10372 [cs.SE]
- On the Concerns of Developers When Using GitHub Copilot. arXiv:2311.01020 [cs.SE]
- Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. Curran Associates Inc., Red Hook, NY, USA.
- Adversarial Robustness of Deep Code Comment Generation. ACM Trans. Softw. Eng. Methodol. 31, 4, Article 60 (jul 2022), 30 pages. https://doi.org/10.1145/3501256
- Interpretable Text-to-SQL Generation with Joint Optimization. In Web Information Systems and Applications: 17th International Conference, WISA 2020, Guangzhou, China, September 23–25, 2020, Proceedings (Guangzhou, China). Springer-Verlag, Berlin, Heidelberg, 341–351. https://doi.org/10.1007/978-3-030-60029-7_32
- Rui Zhu and Cunming Zhang. 2023. How Robust Is a Large Pre-trained Language Model for Code Generationf A Case on Attacking GPT2. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 708–712. https://doi.org/10.1109/SANER56733.2023.00076
- An Empirical Study of Gradient-based Explainability Techniques for Self-admitted Technical Debt Detection. Journal of Internet Technology 23, 3 (2022), 631–641.
- On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Dubrovnik, Croatia, 1090–1102.
- Source Code Data Augmentation for Deep Learning: A Survey. arXiv:2305.19915 [cs.CL]
- Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models. arXiv:2401.00788 [cs.CL]
- Productivity Assessment of Neural Code Completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (San Diego, CA, USA) (MAPS 2022). Association for Computing Machinery, New York, NY, USA, 21–29. https://doi.org/10.1145/3520312.3534864
- Exploring and Evaluating Personalized Models for Code Generation (ESEC/FSE 2022). 1500–1508. https://doi.org/10.1145/3540250.3558959
- Interpreting Deep Learning-Based Vulnerability Detector Predictions Based on Heuristic Searching. ACM Trans. Softw. Eng. Methodol. 30, 2, Article 23 (mar 2021), 31 pages. https://doi.org/10.1145/3429444
- Zhou Yang (82 papers)
- Zhensu Sun (15 papers)
- Terry Zhuo Yue (1 paper)
- Premkumar Devanbu (25 papers)
- David Lo (229 papers)