Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Generalizability of Deep Learning-based Code Completion Across Programming Language Versions (2403.15149v1)

Published 22 Mar 2024 in cs.SE

Abstract: Code completion is a key feature of Integrated Development Environments (IDEs), aimed at predicting the next tokens a developer is likely to write, helping them write code faster and with less effort. Modern code completion approaches are often powered by deep learning (DL) models. However, the swift evolution of programming languages poses a critical challenge to the performance of DL-based code completion models: Can these models generalize across different language versions? This paper delves into such a question. In particular, we assess the capabilities of a state-of-the-art model, CodeT5, to generalize across nine different Java versions, ranging from Java 2 to Java 17, while being exclusively trained on Java 8 code. Our evaluation spans three completion scenarios, namely, predicting tokens, constructs (e.g., the condition of an if statement) and entire code blocks. The results of our study reveal a noticeable disparity among language versions, with the worst performance being obtained in Java 2 and 17 - the most far apart versions compared to Java 8. We investigate possible causes for the performance degradation and show that the adoption of a limited version-specific fine-tuning can partially alleviate the problem. Our work raises awareness on the importance of continuous model refinement, and it can inform the design of alternatives to make code completion models more robust to language evolution.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. [n.d.]a. CodeT5. https://huggingface.co/Salesforce/codet5-base. Accessed: 2023-09-18.
  2. [n.d.]b. Codex Website. https://openai.com/blog/openai-codex. Accessed: 2023-10-08.
  3. [n.d.]. Copilot Website. https://copilot.github.com. Accessed: 2022-11-10.
  4. [n.d.]. GitHub Copilot Chat beta now available for all individuals. https://github.blog/2023-09-20-github-copilot-chat-beta-now-available-for-all-individuals/. Accessed: 2023-10-08.
  5. [n.d.]a. Java Website. https://www.java.com/. Accessed: 2023-10-08.
  6. [n.d.]b. JavaScript | MDN. https://developer.mozilla.org/en-US/docs/Web/JavaScript. Accessed: 2023-10-08.
  7. [n.d.]. Python Website. https://www.python.org/. Accessed: 2023-10-08.
  8. [n.d.]. Replication Package. https://github.com/java-generalization/java-generalization-replication.
  9. [n.d.]. ScrML Website. https://www.srcml.org/. Accessed: 2022-11-10.
  10. Towards Human-Bot Collaborative Software Architecting with ChatGPT. In Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering. 279–285.
  11. Learning to Find Usages of Library Functions in Optimized Binaries. IEEE Trans. Software Eng. 48, 10 (2022), 3862–3876.
  12. How We Refactor and How We Document It? On the Use of Supervised Machine Learning Algorithms to Classify Refactoring Documentation. Expert Systems with Applications 167 (2021), 114176.
  13. Structural language models of code. In International Conference on Machine Learning, ICML. 245–256.
  14. ML-Based Compliance Verification of Data Processing Agreements against GDPR. In 2023 IEEE 31st International Requirements Engineering Conference (RE). IEEE, 53–64.
  15. Andrea Arcuri and Lionel Briand. 2014. A Hitchhiker’s Guide to Statistical Tests for Assessing Randomized Algorithms in Software Engineering. Software Testing, Verification and Reliability 24, 3 (2014), 219–250.
  16. Context-Sensitive Code Completion Tool for Better API Usability. In 30th IEEE International Conference on Software Maintenance and Evolution ICSME. 621–624.
  17. Anushka Bhave and Roopak Sinha. 2022. Deep Multimodal Architecture for Detection of Long Parameter List and Switch Statements using DistilBERT. In 2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 116–120.
  18. Learning from examples to improve code completion systems. In 7th ACM Joint Meeting of the European Software Engineering Conference and the ACM/SIGSOFT International Symposium on Foundations of Software Engineering ESEC-FSE. 213–222.
  19. Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5. In Findings of the Association for Computational Linguistics: EMNLP. 812–823.
  20. NatGen: generative pre-training by "naturalizing" source code. In 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE. 18–30.
  21. Evaluating Large Language Models Trained on Code. CoRR abs/2107.03374 (2021). https://doi.org/10.48550/arXiv.2107.03374
  22. An Empirical Study on the Usage of Transformer Models for Code Completion. IEEE Transactions on Software Engineering, TSE abs/2108.01585, 01 (2021), 1–1.
  23. An Empirical Study on the Usage of BERT Models for Code Completion. In 18th IEEE/ACM International Conference on Mining Software Repositories, MSR 2021. 108–119.
  24. Source Code Recommender Systems: The Practitioners’ Perspective. In Proceedings of the 45th International Conference on Software Engineering (ICSE ’23). IEEE Press, 2161–2172.
  25. To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set?. In 19th IEEE/ACM International Conference on Mining Software Repositories, MSR. 167–178.
  26. Sampling Projects in GitHub for MSR Studies. In 18th IEEE/ACM International Conference on Mining Software Repositories, MSR. 560–564.
  27. Fabiano Dalpiaz and Nan Niu. 2020. Requirements Engineering in the Days of Artificial Intelligence. IEEE software 37, 4 (2020), 7–10.
  28. Aryaz Eghbali and Michael Pradel. 2022. CrystalBLEU: Precisely and Efficiently Measuring the Similarity of Code. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.
  29. Neil A Ernst and Gabriele Bavota. 2022. AI-Driven Development Is Here: Should You Worry? IEEE Software 39, 2 (2022), 106–110.
  30. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In 10th Conference on Empirical Methods in Natural Language Processing, EMNLP. 1536–1547.
  31. A Survey on Concept Drift Adaptation. ACM computing surveys (CSUR) 46, 4 (2014), 1–37.
  32. Vincent J. Hellendoorn and Premkumar Devanbu. 2017. Are Deep Neural Networks the Best Choice for Modeling Source Code?. In 11th ACM/SIGSOFT Joint Meeting on Foundations of Software Engineering ESEC-FSE. 763?773.
  33. When code completion fails: a case study on real-world completions. In 41st IEEE/ACM International Conference on Software Engineering, ICSE. 960–970.
  34. On the naturalness of software. In 34th IEEE/ACM International Conference on Software Engineering, ICSE. 837–847.
  35. Daqing Hou and David M Pletcher. 2010. Towards a better code completion system by API grouping, filtering, and popularity-based ranking. In 2nd International Workshop on Recommendation Systems for Software Engineering, RSSE. 26–30.
  36. Large Language Models for Software Engineering: A Systematic Literature Review. arXiv preprint arXiv:2308.10620 (2023).
  37. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. CoRR abs/1909.09436 (2019). https://doi.org/10.48550/arXiv.1909.09436
  38. Xianhao Jin and Francisco Servant. 2018. The hidden cost of code completion: Understanding the impact of the recommendation-list length on its efficiency. In 15th IEEE/ACM International Conference on Mining Software Repositories, MSR. 70–73.
  39. Code prediction by feeding trees to transformers. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE. 150–162.
  40. CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning. In NeurIPS.
  41. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81.
  42. Deep learning based program generation from requirements text: Are we there yet? IEEE Transactions on Software Engineering 48, 4 (2020), 1268–1289.
  43. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR.
  44. Learning Under Concept Drift: A Review. IEEE transactions on knowledge and data engineering 31, 12 (2018), 2346–2363.
  45. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In 35th Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks.
  46. Jungloid mining: helping to navigate the API jungle. In 27th ACM/SIGPLAN Conference on Programming Language Design and Implementation PLDI. 48–61.
  47. An empirical investigation of code completion usage by professional software developers. In 26th Annual Workshop of the Psychology of Programming Interest Group, PPIG. 14.
  48. Using Transfer Learning for Code-Related Tasks. IEEE Transactions on Software Engineering, TSE (2022), 1–20.
  49. Deep Learning-Based Prediction of Test Input Validity for RESTful APIs. In International Workshop on Testing for Deep Learning and Deep Learning for Testing. 9–16.
  50. Software Development Effort Estimation Using Regression Fuzzy Models. Computational Intelligence and Neuroscience 2019 (2019).
  51. BLEU: A Method for Automatic Evaluation of Machine Translation. In 40th Annual Meeting on Association for Computational Linguistics, ACL. 311–318.
  52. The impact of AI on developer productivity: Evidence from GitHub Copilot. arXiv preprint arXiv:2302.06590 (2023).
  53. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21 (2020), 140:1–140:67.
  54. CAT-LM: Training Language Models on Aligned Code And Tests. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE.
  55. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. CoRR abs/2009.10297 (2020). https://doi.org/10.48550/arXiv.2009.10297
  56. Romain Robbes and Michele Lanza. 2010. Improving code completion with program history. Automated Software Engineering, ASE 17, 2 (2010), 181–212.
  57. IntelliCode compose: code generation using transformer. In 28th ACM Joint European Software Engineering Conference and the ACM/SIGSOFT International Symposium on the Foundations of Software Engineering ESEC-FSE. 1433–1443.
  58. Sergey Troshin and Nadezhda Chirkova. 2022. Probing Pretrained Models of Source Codes. In Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@EMNLP. 371–383.
  59. On the localness of software. In 22nd ACM/SIGSOFT International Symposium on Foundations of Software Engineering, FSE. 269–280.
  60. Using Pre-Trained Models to Boost Code Review Automation. In 44th IEEE/ACM International Conference on Software Engineering, ICSE. 2291–2302.
  61. Compilable Neural Code Generation with Compiler Feedback. In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022. 9–19.
  62. CodeT5+: Open Code Large Language Models for Code Understanding and Generation. CoRR abs/2305.07922 (2023). https://doi.org/10.48550/arXiv.2305.07922
  63. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In 11st Conference on Empirical Methods in Natural Language Processing, EMNLP. 8696–8708.
  64. On Learning Meaningful Assert Statements for Unit Test Cases. In 42nd IEEE/ACM International Conference on Software Engineering, ICSE. 1398–1409.
  65. Siri, Write the Next Method. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE. 138–149.
  66. In-IDE Code Generation from Natural Language: Promise and Challenges. ACM Trans. Softw. Eng. Methodol. 31, 2 (2022), 29:1–29:47.
  67. Benjamini Yoav and Hochberg Yosef. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological) 57, 1 (1995), 289–300.
  68. Controlled Text Generation with Natural Language Instructions. In International Conference on Machine Learning, ICML. 42602–42613.
  69. Productivity assessment of neural code completion. In International Symposium on Machine Programming. 21–29.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Matteo Ciniselli (11 papers)
  2. Alberto Martin-Lopez (6 papers)
  3. Gabriele Bavota (60 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.