Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example (2402.07138v3)
Abstract: Software developers often repeat code changes, known as "code change patterns" (CPATs), within and across projects. Automating these CPATs accelerates development, but current Transformation by Example (TBE) techniques are limited by the input examples' quality and quantity, missing variations with different syntax or flow yet semantically similar. LLMs, trained on vast code datasets, can overcome these limitations by generating semantically equivalent, unseen CPAT variants, enhancing TBE effectiveness. We identified best practices for using LLMs to generate code variants meeting criteria of correctness, usefulness, and applicability. Implementing these in PyCraft, combining static and dynamic analysis with LLMs, we achieved an F-measure of 96.6% in identifying correct variants, expanding inputs by 58x on average, and automating changes to increase target codes by up to 39x. Patches from PyCraft were submitted to projects like microsoft/DeepSpeed and IBM/inFairness, with an 83% acceptance rate, validating our approach's usefulness.
- Google AI. 2023. Google Bard: An Early Experiment with Generative AI. https://ai.google/static/documents/google-about-bard.pdf
- On the usage of pythonic idioms. In Proceedings of the 2018 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward! 2018). https://doi.org/10.1145/3276954.3276960
- Miltiadis Allamanis and Charles Sutton. 2014. Mining Idioms from Source Code. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). https://doi.org/10.1145/2635868.2635901
- J. Andersen and J. L. Lawall. 2008. Generic Patch Inference. In Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2008). https://doi.org/10.1109/ASE.2008.44
- Getafix: Learning to Fix Bugs Automatically. Proc. ACM Program. Lang. OOPSLA (2019). https://doi.org/10.1145/3360585
- DeepCoder: Learning to Write Programs. ArXiv (2016).
- The Plastic Surgery Hypothesis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). https://doi.org/10.1145/2635868.2635898
- The open-closed principle of modern machine learning frameworks. In Proceedings of the 15th International Conference on Mining Software Repositories (MSR ’18). https://doi.org/10.1145/3196398.3196445
- Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020).
- Coding in-depth semistructured interviews: Problems of unitization and intercoder reliability and agreement. Sociological Methods & Research 3 (2013). https://doi.org/10.1177/0049124113500475
- An Empirical Study on the Usage of Transformer Models for Code Completion. IEEE Transactions on Software Engineering 12 (2022). https://doi.org/10.1109/TSE.2021.3128234
- James Coplien. 1992. Advanced C++ Programming Styles and Idioms. Addison—W esley. Reading, MA (1992).
- Barthélémy Dagenais and Martin P. Robillard. 2011. Recommending Adaptive Changes for Framework Evolution. ACM Trans. Softw. Eng. Methodol. 4 (2011). https://doi.org/10.1145/2000799.2000805
- PYEVOLVE: Automating Frequent Code Changes in Python ML Systems. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). https://doi.org/10.1109/ICSE48619.2023.00091
- Understanding Software-2.0: A Study of Machine Learning Library Usage and Evolution. ACM Trans. Softw. Eng. Methodol. 4 (2021). https://doi.org/10.1145/3453478
- Discovering Repetitive Code Changes in Python ML Systems. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). https://doi.org/10.1145/3510003.3510225
- Mining Billions of AST Nodes to Study Actual and Potential Usage of Java Language Features. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). https://doi.org/10.1145/2568225.2568295
- From Commit Message Generation to History-Aware Commit Message Completion. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). https://doi.org/10.1109/ASE56229.2023.00078
- Zhiyu Fan and Xiang Gao. [n. d.]. Automated Repair of Programs from Large Language Models. ([n. d.]).
- Automated API-Usage Update for Android Apps. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2019). https://doi.org/10.1145/3293882.3330571
- APIMigrator: An API-Usage Migration Tool for Android Apps. In Proceedings of the IEEE/ACM 7th International Conference on Mobile Software Engineering and Systems (MOBILESoft ’20). https://doi.org/10.1145/3387905.3388608
- Component-Based Synthesis of Table Consolidation and Transformation Tasks from Examples (PLDI 2017). https://doi.org/10.1145/3062341.3062351
- CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.139
- What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs?. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE ‘38). ACM. https://arxiv.org/abs/2304.07575
- APIfix: Output-Oriented Program Synthesis for Combating Breaking Changes in Libraries. Proc. ACM Program. Lang. OOPSLA (2021). https://doi.org/10.1145/3485538
- Automatic Android Deprecated-API Usage Update by Learning from Single Updated Example. In Proceedings of the 28th International Conference on Program Comprehension (ICPC ’20). https://doi.org/10.1145/3387904.3389285
- AndroEvolve: Automated Android API Update with Data Flow Analysis and Variable Denormalization. Empirical Software Engineering 3 (2022). https://doi.org/10.1007/s10664-021-10096-0
- MLCatchUp: Automated Update of Deprecated Machine-Learning APIs in Python. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). https://doi.org/10.1109/ICSME52107.2021.00061
- Johannes Henkel and Amer Diwan. 2005. CatchUp! Capturing and Replaying Refactorings to Support API Evolution. In Proceedings of the 27th International Conference on Software Engineering (ICSE ’05). https://doi.org/10.1145/1062455.1062512
- On the Naturalness of Software. Commun. ACM 5 (2016). https://doi.org/10.1145/2902362
- Mining system specific rules from change patterns. In 2013 20th Working Conference on Reverse Engineering (WCRE). https://doi.org/10.1109/WCRE.2013.6671308
- Anders Hovmöller. 2023. mutmut - python mutation tester. https://github.com/boxed/mutmut.
- Inferring and Applying Type Changes. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). https://doi.org/10.1145/3510003.3510115
- Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). https://doi.org/10.1109/ICSE.2019.00072
- A3: Assisting Android API Migrations Using Code Examples. IEEE Transactions on Software Engineering 2 (2022). https://doi.org/10.1109/TSE.2020.2988396
- J. Richard Landis and Gary G. Koch. 1977. The Measurement of Observer Agreement for Categorical Data. Biometrics 1 (1977). https://doi.org/10.2307/2529310
- Systematic Editing: Generating Program Transformations from an Example. SIGPLAN Not. 6 (2011). https://doi.org/10.1145/1993316.1993537
- LASE: Locating and Applying Systematic Edits by Learning from Examples. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). https://doi.org/10.1109/ICSE.2013.6606596
- On the Fly Synthesis of Edit Suggestions. Proc. ACM Program. Lang. OOPSLA (2019). https://doi.org/10.1145/3360569
- Mining Fine-Grained Code Changes to Detect Unknown Change Patterns. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). https://doi.org/10.1145/2568225.2568317
- Graph-Based Mining of in-the-Wild, Fine-Grained, Semantic Code Change Patterns (ICSE ’19). https://doi.org/10.1109/ICSE.2019.00089
- OpenAI. 2023. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023).
- OpenAI. 2023. OpenAI Codex. https://openai.com/blog/openai-codex.
- Teddy: Automatic Recommendation of Pythonic Idiom Usage For Pull-Based Software Projects. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). https://doi.org/10.1109/ICSME46990.2020.00098
- Synchromesh: Reliable Code Generation from Pre-trained Language Models. In International Conference on Learning Representations. https://arxiv.org/abs/2201.11227
- Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring. (2024). arXiv:2401.15298 [cs.SE]
- Michael Pradel and Koushik Sen. 2018. DeepBugs: A Learning Approach to Name-Based Bug Detection. Proc. ACM Program. Lang. OOPSLA (2018). https://doi.org/10.1145/3276517
- PyCraft-Authors. 2023. PyCraft. https://pycrafttool.github.io Accessed: 2024-02-26.
- Language models are unsupervised multitask learners. OpenAI blog 8 (2019).
- MELT: Mining Effective Lightweight Transformations from Pull Requests. In 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). https://doi.org/10.1109/ASE56229.2023.00117
- Learning Syntactic Program Transformations from Examples. In Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). https://doi.org/10.1109/ICSE.2017.44
- SourcererCC: Scaling Code Clone Detection to Big-Code. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). https://doi.org/10.1145/2884781.2884877
- Visualizing the Usage of Pythonic Idioms Over Time: A Case Study of the with open Idiom. In 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP). https://doi.org/10.1109/IWESEP49350.2019.00016
- SPINFER: Inferring Semantic Patches for the Linux Kernel. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC ’20). USENIX Association. https://www.usenix.org/conference/atc20/presentation/serrano
- Feng Sidong and Chen Chunyang. 2024. Prompting Is All Your Need: Automated Android Bug Replay with Large Language Models. In Proceedings of the 46th International Conference on Software Engineering (ICSE ’24).
- Rishabh Singh. 2016. BlinkFill: Semi-Supervised Programming by Example for Syntactic String Transformations. Proc. VLDB Endow. 10 (2016). https://doi.org/10.14778/2977797.2977807
- Learning Quick Fixes from Code Repositories. In Proceedings of the XXXV Brazilian Symposium on Software Engineering (SBES ’21). https://doi.org/10.1145/3474624.3474650
- An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). https://doi.org/10.1109/ICSE43902.2021.00033
- Rijnard van Tonder and Claire Le Goues. 2019. Lightweight Multi-Language Syntax Transformation with Parser Parser Combinators. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). https://doi.org/10.1145/3314221.3314589
- PyNose: A Test Smell Detector For Python. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). https://doi.org/10.1109/ASE51524.2021.9678615
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/2021.emnlp-main.685
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
- David Wicks. 2017. The coding manual for qualitative researchers. Qualitative research in organizations and management: an international journal (2017). https://doi.org/10.1108/QROM-08-2016-1408
- Meditor: Inference and Application of API Migration Edits. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). https://doi.org/10.1109/ICPC.2019.00052
- Repairing bugs in python assignments using large language models. arXiv preprint arXiv:2209.14876 (2022).
- Making Python code idiomatic by automatic refactoring non-idiomatic Python code with pythonic idioms. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). https://doi.org/10.1145/3540250.3549143