Together We Go Further: LLMs and IDE Static Analysis for Extract Method Refactoring
Abstract: Long methods that encapsulate multiple responsibilities within a single method are challenging to maintain. Choosing which statements to extract into new methods has been the target of many research tools. Despite steady improvements, these tools often fail to generate refactorings that align with developers' preferences and acceptance criteria. Given that LLMs have been trained on large code corpora, if we harness their familiarity with the way developers form functions, we could suggest refactorings that developers are likely to accept. In this paper, we advance the science and practice of refactoring by synergistically combining the insights of LLMs with the power of IDEs to perform Extract Method (EM). Our formative study on 1752 EM scenarios revealed that LLMs are very effective for giving expert suggestions, yet they are unreliable: up to 76.3% of the suggestions are hallucinations. We designed a novel approach that removes hallucinations from the candidates suggested by LLMs, then further enhances and ranks suggestions based on static analysis techniques from program slicing, and finally leverages the IDE to execute refactorings correctly. We implemented this approach in an IntelliJ IDEA plugin called EM-Assist. We empirically evaluated EM-Assist on a diverse corpus that replicates 1752 actual refactorings from open-source projects. We found that EM-Assist outperforms previous state of the art tools: EM-Assist suggests the developerperformed refactoring in 53.4% of cases, improving over the recall rate of 39.4% for previous best-in-class tools. Furthermore, we conducted firehouse surveys with 16 industrial developers and suggested refactorings on their recent commits. 81.3% of them agreed with the recommendations provided by EM-Assist.
- Fine slicing for advanced method extraction. In 3rd workshop on refactoring tools, Vol. 21.
- From Commit Message Generation to History-Aware Commit Message Completion. (ASE 2023). https://arxiv.org/pdf/2308.07655.pdf
- AntiCopyPaster: Extracting Code Duplicates As Soon As They Are Introduced in the IDE. 1–4. https://doi.org/10.1145/3551349.3559537
- Just-in-time code duplicates extraction. Information and Software Technology 158 (02 2023), 107169. https://doi.org/10.1016/j.infsof.2023.107169
- On the documentation of refactoring types. Automated Software Engineering 29 (2022), 1–40.
- The effectiveness of supervised machine learning algorithms in predicting software refactoring. IEEE Transactions on Software Engineering 48, 4 (2020), 1432–1450.
- Anonymous. 2023. Replication Package at GitHub. https://llm-refactoring.github.io/
- Anthropic. 2023. Introducing Claude. https://www.anthropic.com/index/introducing-claude
- Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology 108 (2019), 115–138.
- Software complexity and maintenance costs. Commun. ACM 36, 11 (1993), 81–95.
- Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 2 (2006), 77–101. https://doi.org/10.1191/1478088706qp063oa arXiv:https://www.tandfonline.com/doi/pdf/10.1191/1478088706qp063oa
- Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
- Coding In-depth Semistructured Interviews: Problems of Unitization and Intercoder Reliability and Agreement. Sociological Methods & Research 42, 3 (2013), 294–320. https://doi.org/10.1177/0049124113500475
- Identifying Extract Method Refactoring Opportunities Based on Functional Relevance. IEEE Transactions on Software Engineering 43, 10 (2017), 954–974. https://doi.org/10.1109/TSE.2016.2645572
- An Empirical Study on the Usage of Transformer Models for Code Completion. IEEE Transactions on Software Engineering 48, 12 (2022), 4818–4837. https://doi.org/10.1109/TSE.2021.3128234
- Bradley E. Cossette and Robert J. Walker. 2012. Seeking the Ground Truth: A Retroactive Study on the Evolution and Migration of Software Libraries (FSE ’12). Association for Computing Machinery, New York, NY, USA, Article 55, 11 pages. https://doi.org/10.1145/2393596.2393661
- Daniela S. Cruzes and Tore DybĂ¥. 2011. Research synthesis in software engineering: A tertiary study. Information and Software Technology 53, 5 (2011), 440–455. https://doi.org/10.1016/j.infsof.2011.01.004 Special Section on Best Papers from XP2010.
- REMS: Recommending Extract Method Refactoring Opportunities via Multi-view Representation of Code Property Graph. In 2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC). 191–202. https://doi.org/10.1109/ICPC58990.2023.00034
- Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example. In 32nd ACM Symposium on the Foundations of Software Engineering (FSE ’24). to appear.
- Understanding Software-2.0: A Study of Machine Learning Library Usage and Evolution. ACM Trans. Softw. Eng. Methodol. 30, 4, Article 55 (jul 2021), 42Â pages. https://doi.org/10.1145/3453478
- Discovering repetitive code changes in python ML systems. In Proceedings of the 44th International Conference on Software Engineering (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 736–748. https://doi.org/10.1145/3510003.3510225
- Improving source code readability: Theory and practice. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE, 2–12.
- Falcon. 2023. Falcon. https://falconllm.tii.ae
- Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155 (2020).
- A Live Environment to Improve the Refactoring Experience. In Companion Proceedings of the 6th International Conference on the Art, Science, and Engineering of Programming. 30–37.
- LiveRef: A Tool for Live Refactoring Java Code. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE ’22). Association for Computing Machinery, New York, NY, USA, Article 161, 4 pages. https://doi.org/10.1145/3551349.3559532
- Martin Fowler. 1999. Refactoring: Improving the Design of Existing Code. Addison-Wesley.
- What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs?. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE ‘38). ACM. https://arxiv.org/abs/2304.07575
- GoogleAI. 2023. Google Bard: An Early Experiment with Generative AI. https://ai.google/static/documents/google-about-bard.pdf
- Roman Haas and Benjamin Hummel. 2015. Deriving extract method refactoring suggestions for long methods. In International Conference on Software Quality. Springer, 144–155.
- Large Language Models for Software Engineering: A Systematic Literature Review. arXiv preprint arXiv:2308.10620 (2023).
- JetBrains. 2023a. CoreNLP. (2023). https://github.com/stanfordnlp/CoreNLP
- JetBrains. 2023b. IntelliJ Community Edition. (2023). https://github.com/JetBrains/intellij-community
- The Stack: 3 TB of permissively licensed source code. arXiv:2211.15533Â [cs.CL]
- Arun Lakhotia and Jean-Christophe Deprez. 1998. Restructuring programs by tucking statements into functions. Information and Software Technology 40, 11-12 (1998), 677–689.
- RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2020).
- The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation. arXiv:2305.06156Â [cs.CL]
- Robert C. Martin. 2017. Clean Architecture: A Craftsman’s Guide to Software Structure and Design (1st ed.). Prentice Hall Press, USA.
- Katsuhisa Maruyama. 2001. Automated Method-Extraction Refactoring by Using Block-Based Slicing. In Proceedings of the 2001 Symposium on Software Reusability: Putting Software Reuse in Context (Toronto, Ontario, Canada) (SSR ’01). Association for Computing Machinery, New York, NY, USA, 31–40. https://doi.org/10.1145/375212.375233
- Refactoring opportunity identification methodology for removing long method smells and improving code analyzability. IEICE TRANSACTIONS on Information and Systems 101, 7 (2018), 1766–1779.
- Meta. 2023. Introducing Llama. https://ai.meta.com/llama/
- Emerson Murphy-Hill and Andrew P Black. 2008. Breaking the barriers to successful refactoring: observations and tools for extract method. In Proceedings of the 30th international conference on Software engineering. 421–430.
- How We Refactor, and How We Know It. IEEE Transactions on Software Engineering 38, 1 (2012), 5–18. https://doi.org/10.1109/TSE.2011.41
- The Design Space of Bug Fixes and How Developers Navigate It. IEEE Transactions on Software Engineering 41, 1 (2015), 65–81. https://doi.org/10.1109/TSE.2014.2357438
- A Comparative Study of Manual and Automated Refactorings. In ECOOP 2013 – Object-Oriented Programming, Giuseppe Castagna (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 552–576.
- OpenAI. 2023. GPT-4 Technical Report. (2023). https://arxiv.org/pdf/2303.08774.pdf
- Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334 (2023).
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- MELT: Mining Effective Lightweight Transformations from Pull Requests. (ASE 2023). https://arxiv.org/abs/2308.14687
- Laria Reynolds and Kyle McDonell. 2021. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI EA ’21). Association for Computing Machinery, New York, NY, USA, Article 314, 7 pages. https://doi.org/10.1145/3411763.3451760
- Comparing commit messages and source code metrics for the prediction refactoring activities. Algorithms 14, 10 (2021), 289.
- Automatically assessing code understandability. IEEE Transactions on Software Engineering 47, 3 (2019), 595–613.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022).
- Feng Sidong and Chen Chunyang. 2024. Prompting Is All Your Need: Automated Android Bug Replay with Large Language Models. In Proceedings of the 46th International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery, New York, NY, USA, 10 pages.
- Recommending automated extract method refactorings. In Proceedings of the 22nd International Conference on Program Comprehension. 146–156.
- Jextract: An eclipse plug-in for recommending automated extract method refactorings. arXiv preprint arXiv:1506.06086 (2015).
- Why We Refactor? Confessions of GitHub Contributors (FSE 2016). Association for Computing Machinery, New York, NY, USA, 858–870. https://doi.org/10.1145/2950290.2950305
- Software Engineering Data Collection for Field Studies. Springer London, London, 9–34. https://doi.org/10.1007/978-1-84800-044-5_1
- Robert Tairas and Jeff Gray. 2012. Increasing clone maintenance support by unifying clone detection and refactoring activities. Information and Software Technology 54, 12 (2012), 1297–1307.
- Omkarendra Tiwari and Rushikesh Joshi. 2022. Identifying Extract Method Refactorings. In 15th Innovations in Software Engineering Conference (Gandhinagar, India) (ISEC 2022). Association for Computing Machinery, New York, NY, USA, Article 7, 11Â pages. https://doi.org/10.1145/3511430.3511435
- Nikolaos Tsantalis and Alexander Chatzigeorgiou. 2011. Identification of extract method refactoring opportunities for the decomposition of methods. Journal of Systems and Software 84, 10 (2011), 1757–1782. https://doi.org/10.1016/j.jss.2011.05.016
- RefactoringMiner 2.0. IEEE Transactions on Software Engineering 48, 3 (2022), 930–950. https://doi.org/10.1109/TSE.2020.3007722
- Data-driven extract method recommendations: a study at ING. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1337–1347.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems, Vol. 35. Curran Associates, Inc., 24824–24837. https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf
- Chatgpt prompt patterns for improving code quality, refactoring, requirements elicitation, and software design. arXiv preprint arXiv:2303.07839 (2023).
- GEMS: An Extract Method Refactoring Recommender. In 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE). 24–34. https://doi.org/10.1109/ISSRE.2017.35
- Identifying Fragments to Be Extracted from Long Methods. In Proceedings of the 2009 16th Asia-Pacific Software Engineering Conference (APSEC ’09). IEEE Computer Society, USA, 43–49. https://doi.org/10.1109/APSEC.2009.20
- Identifying fragments to be extracted from long methods. In 2009 16th Asia-Pacific Software Engineering Conference. IEEE, 43–49.
- Proactive clone recommendation system for extract method refactoring. In 2019 IEEE/ACM 3rd International Workshop on Refactoring (IWoR). IEEE, 67–70.
- Automatic clone recommendation for refactoring based on the present and the past. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 115–126.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.