Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
135 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
55 tokens/sec
2000 character limit reached

Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example (2402.07138v3)

Published 11 Feb 2024 in cs.SE

Abstract: Software developers often repeat code changes, known as "code change patterns" (CPATs), within and across projects. Automating these CPATs accelerates development, but current Transformation by Example (TBE) techniques are limited by the input examples' quality and quantity, missing variations with different syntax or flow yet semantically similar. LLMs, trained on vast code datasets, can overcome these limitations by generating semantically equivalent, unseen CPAT variants, enhancing TBE effectiveness. We identified best practices for using LLMs to generate code variants meeting criteria of correctness, usefulness, and applicability. Implementing these in PyCraft, combining static and dynamic analysis with LLMs, we achieved an F-measure of 96.6% in identifying correct variants, expanding inputs by 58x on average, and automating changes to increase target codes by up to 39x. Patches from PyCraft were submitted to projects like microsoft/DeepSpeed and IBM/inFairness, with an 83% acceptance rate, validating our approach's usefulness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Google AI. 2023. Google Bard: An Early Experiment with Generative AI. https://ai.google/static/documents/google-about-bard.pdf
  2. On the usage of pythonic idioms. In Proceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward! 2018). https://doi.org/10.1145/3276954.3276960
  3. Miltiadis Allamanis and Charles Sutton. 2014. Mining Idioms from Source Code. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). https://doi.org/10.1145/2635868.2635901
  4. J. Andersen and J. L. Lawall. 2008. Generic Patch Inference. In Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2008). https://doi.org/10.1109/ASE.2008.44
  5. Getafix: Learning to Fix Bugs Automatically. Proc. ACM Program. Lang. OOPSLA (2019). https://doi.org/10.1145/3360585
  6. DeepCoder: Learning to Write Programs. ArXiv (2016).
  7. The Plastic Surgery Hypothesis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). https://doi.org/10.1145/2635868.2635898
  8. The open-closed principle of modern machine learning frameworks. In Proceedings of the 15th International Conference on Mining Software Repositories (MSR ’18). https://doi.org/10.1145/3196398.3196445
  9. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020).
  10. Coding in-depth semistructured interviews: Problems of unitization and intercoder reliability and agreement. Sociological Methods & Research 3 (2013). https://doi.org/10.1177/0049124113500475
  11. An Empirical Study on the Usage of Transformer Models for Code Completion. IEEE Transactions on Software Engineering 12 (2022). https://doi.org/10.1109/TSE.2021.3128234
  12. James Coplien. 1992. Advanced C++ Programming Styles and Idioms. Addison—W esley. Reading, MA (1992).
  13. Barthélémy Dagenais and Martin P. Robillard. 2011. Recommending Adaptive Changes for Framework Evolution. ACM Trans. Softw. Eng. Methodol. 4 (2011). https://doi.org/10.1145/2000799.2000805
  14. PYEVOLVE: Automating Frequent Code Changes in Python ML Systems. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). https://doi.org/10.1109/ICSE48619.2023.00091
  15. Understanding Software-2.0: A Study of Machine Learning Library Usage and Evolution. ACM Trans. Softw. Eng. Methodol. 4 (2021). https://doi.org/10.1145/3453478
  16. Discovering Repetitive Code Changes in Python ML Systems. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). https://doi.org/10.1145/3510003.3510225
  17. Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). https://doi.org/10.1145/2568225.2568295
  18. From Commit Message Generation to History-Aware Commit Message Completion. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). https://doi.org/10.1109/ASE56229.2023.00078
  19. Zhiyu Fan and Xiang Gao. [n. d.]. Automated Repair of Programs from Large Language Models. ([n. d.]).
  20. Automated API-Usage Update for Android Apps. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2019). https://doi.org/10.1145/3293882.3330571
  21. APIMigrator: An API-Usage Migration Tool for Android Apps. In Proceedings of the IEEE/ACM 7th International Conference on Mobile Software Engineering and Systems (MOBILESoft ’20). https://doi.org/10.1145/3387905.3388608
  22. Component-Based Synthesis of Table Consolidation and Transformation Tasks from Examples (PLDI 2017). https://doi.org/10.1145/3062341.3062351
  23. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.139
  24. What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs?. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE ‘38). ACM. https://arxiv.org/abs/2304.07575
  25. APIfix: Output-Oriented Program Synthesis for Combating Breaking Changes in Libraries. Proc. ACM Program. Lang. OOPSLA (2021). https://doi.org/10.1145/3485538
  26. Automatic Android Deprecated-API Usage Update by Learning from Single Updated Example. In Proceedings of the 28th International Conference on Program Comprehension (ICPC ’20). https://doi.org/10.1145/3387904.3389285
  27. AndroEvolve: Automated Android API Update with Data Flow Analysis and Variable Denormalization. Empirical Software Engineering 3 (2022). https://doi.org/10.1007/s10664-021-10096-0
  28. MLCatchUp: Automated Update of Deprecated Machine-Learning APIs in Python. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). https://doi.org/10.1109/ICSME52107.2021.00061
  29. Johannes Henkel and Amer Diwan. 2005. CatchUp! Capturing and Replaying Refactorings to Support API Evolution. In Proceedings of the 27th International Conference on Software Engineering (ICSE ’05). https://doi.org/10.1145/1062455.1062512
  30. On the Naturalness of Software. Commun. ACM 5 (2016). https://doi.org/10.1145/2902362
  31. Mining system specific rules from change patterns. In 2013 20th Working Conference on Reverse Engineering (WCRE). https://doi.org/10.1109/WCRE.2013.6671308
  32. Anders Hovmöller. 2023. mutmut - python mutation tester. https://github.com/boxed/mutmut.
  33. Inferring and Applying Type Changes. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). https://doi.org/10.1145/3510003.3510115
  34. Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). https://doi.org/10.1109/ICSE.2019.00072
  35. A3: Assisting Android API Migrations Using Code Examples. IEEE Transactions on Software Engineering 2 (2022). https://doi.org/10.1109/TSE.2020.2988396
  36. J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 1 (1977). https://doi.org/10.2307/2529310
  37. Systematic Editing: Generating Program Transformations from an Example. SIGPLAN Not. 6 (2011). https://doi.org/10.1145/1993316.1993537
  38. LASE: Locating and Applying Systematic Edits by Learning from Examples. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). https://doi.org/10.1109/ICSE.2013.6606596
  39. On the Fly Synthesis of Edit Suggestions. Proc. ACM Program. Lang. OOPSLA (2019). https://doi.org/10.1145/3360569
  40. Mining Fine-Grained Code Changes to Detect Unknown Change Patterns. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). https://doi.org/10.1145/2568225.2568317
  41. Graph-Based Mining of in-the-Wild, Fine-Grained, Semantic Code Change Patterns (ICSE ’19). https://doi.org/10.1109/ICSE.2019.00089
  42. OpenAI. 2023. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023).
  43. OpenAI. 2023. OpenAI Codex. https://openai.com/blog/openai-codex.
  44. Teddy: Automatic Recommendation of Pythonic Idiom Usage For Pull-Based Software Projects. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). https://doi.org/10.1109/ICSME46990.2020.00098
  45. Synchromesh: Reliable Code Generation from Pre-trained Language Models. In International Conference on Learning Representations. https://arxiv.org/abs/2201.11227
  46. Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring. (2024). arXiv:2401.15298 [cs.SE]
  47. Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proc. ACM Program. Lang. OOPSLA (2018). https://doi.org/10.1145/3276517
  48. PyCraft-Authors. 2023. PyCraft. https://pycrafttool.github.io Accessed: 2024-02-26.
  49. Language models are unsupervised multitask learners. OpenAI blog 8 (2019).
  50. MELT: Mining Effective Lightweight Transformations from Pull Requests. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). https://doi.org/10.1109/ASE56229.2023.00117
  51. Learning Syntactic Program Transformations from Examples. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). https://doi.org/10.1109/ICSE.2017.44
  52. SourcererCC: Scaling Code Clone Detection to Big-Code. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). https://doi.org/10.1145/2884781.2884877
  53. Visualizing the Usage of Pythonic Idioms Over Time: A Case Study of the with open Idiom. In 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP). https://doi.org/10.1109/IWESEP49350.2019.00016
  54. SPINFER: Inferring Semantic Patches for the Linux Kernel. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC ’20). USENIX Association. https://www.usenix.org/conference/atc20/presentation/serrano
  55. Feng Sidong and Chen Chunyang. 2024. Prompting Is All Your Need: Automated Android Bug Replay with Large Language Models. In Proceedings of the 46th International Conference on Software Engineering (ICSE ’24).
  56. Rishabh Singh. 2016. BlinkFill: Semi-Supervised Programming by Example for Syntactic String Transformations. Proc. VLDB Endow. 10 (2016). https://doi.org/10.14778/2977797.2977807
  57. Learning Quick Fixes from Code Repositories. In Proceedings of the XXXV Brazilian Symposium on Software Engineering (SBES ’21). https://doi.org/10.1145/3474624.3474650
  58. An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). https://doi.org/10.1109/ICSE43902.2021.00033
  59. Rijnard van Tonder and Claire Le Goues. 2019. Lightweight Multi-Language Syntax Transformation with Parser Parser Combinators. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). https://doi.org/10.1145/3314221.3314589
  60. PyNose: A Test Smell Detector For Python. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). https://doi.org/10.1109/ASE51524.2021.9678615
  61. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/2021.emnlp-main.685
  62. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
  63. David Wicks. 2017. The coding manual for qualitative researchers. Qualitative research in organizations and management: an international journal (2017). https://doi.org/10.1108/QROM-08-2016-1408
  64. Meditor: Inference and Application of API Migration Edits. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). https://doi.org/10.1109/ICPC.2019.00052
  65. Repairing bugs in python assignments using large language models. arXiv preprint arXiv:2209.14876 (2022).
  66. Making Python code idiomatic by automatic refactoring non-idiomatic Python code with pythonic idioms. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). https://doi.org/10.1145/3540250.3549143
Citations (10)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.