Moving beyond Deletions: Program Simplification via Diverse Program Transformations (2401.15234v1)
Abstract: To reduce the complexity of software, Developers manually simplify program (known as developer-induced program simplification in this paper) to reduce its code size yet preserving its functionality but manual simplification is time-consuming and error-prone. To reduce manual effort, rule-based approaches (e.g., refactoring) and deletion-based approaches (e.g., delta debugging) can be potentially applied to automate developer-induced program simplification. However, as there is little study on how developers simplify programs in Open-source Software (OSS) projects, it is unclear whether these approaches can be effectively used for developer-induced program simplification. Hence, we present the first study of developer-induced program simplification in OSS projects, focusing on the types of program transformations used, the motivations behind simplifications, and the set of program transformations covered by existing refactoring types. Our study of 382 pull requests from 296 projects reveals that there exist gaps in applying existing approaches for automating developer-induced program simplification. and outlines the criteria for designing automatic program simplification techniques. Inspired by our study and to reduce the manual effort in developer-induced program simplification, we propose SimpT5, a tool that can automatically produce simplified programs (semantically-equivalent programs with reduced source lines of code). SimpT5 is trained based on our collected dataset of 92,485 simplified programs with two heuristics: (1) simplified line localization that encodes lines changed in simplified programs, and (2)checkers that measure the quality of generated programs. Our evaluation shows that SimpT5 are more effective than prior approaches in automating developer-induced program simplification.
- [n. d.]. Artifact. https://anonymous.4open.science/r/Automated-Program-Simplification-CCC6/
- [n. d.]. Convert existing generics to diamond syntax. https://stackoverflow.com/questions/6796545/convert-existing-generics-to-diamond-syntax
- [n. d.]. Foreach example. https://github.com/ggavriilidis/adventOfCodeChallenge2020/commit/e56363b6d409e491506b084e7319c9bc82895cd6
- [n. d.]. Huggingface. https://huggingface.co/models
- [n. d.]. Inline variable example commit link. https://github.com/discoStar711/job-finder/pull/10/commits/4559aa19ca2228a0ebb7242636751c5be4ba42e3
- [n. d.]. IntelliJ Organize Imports. https://stackoverflow.com/questions/8608710/intellij-organize-imports
- [n. d.]. OpenPrompt. https://github.com/thunlp/OpenPrompt
- [n. d.]. Perfect prediction example. https://github.com/Verdoso/VersioningDemo/commit/6afa0093e24fbb26f4597fdbbcef22714e3d4aa8
- Hiralal Agrawal and Joseph R Horgan. 1990. Dynamic program slicing. ACM SIGPlan Notices 25, 6 (1990), 246–256.
- RefBot: intelligent software refactoring bot. In 2019 34th IEEE/ACM international conference on automated software engineering (ASE). 823–834.
- Can refactoring be self-affirmed? an exploratory study on how developers document their refactoring activities in commit messages. In 2019 IEEE/ACM 3rd International Workshop on Refactoring (IWoR). 51–58.
- How we refactor and how we document it? On the use of supervised machine learning algorithms to classify refactoring documentation. Expert Systems with Applications 167 (2021), 114176.
- Machine learning techniques for code smell detection: A systematic literature review and meta-analysis. Information and Software Technology (2019).
- Jagdish Bansiya and Carl G. Davis. 2002. A hierarchical model for object-oriented design quality assessment. IEEE Transactions on software engineering 28, 1 (2002), 4–17.
- Refactoring: improving the design of existing code. Addison-Wesley Professional.
- Automated refactoring of object oriented code into aspects. In 21st IEEE International Conference on Software Maintenance (ICSM’05). 27–36.
- ORBS: Language-independent program slicing. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 109–120.
- Observation-based slicing. RN (2013).
- Rodrigo Brito and Marco Tulio Valente. 2021. RAID: Tool support for refactoring-aware code reviews. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). 265–275.
- Raymond PL Buse and Westley R Weimer. 2008. A metric for software readability. In Proceedings of the 2008 international symposium on Software testing and analysis.
- G Ann Campbell. 2018. Cognitive complexity: An overview and evaluation. In Proceedings of the 2018 international conference on technical debt. 57–58.
- Daniela S Cruzes and Tore Dyba. 2011. Recommended steps for thematic synthesis in software engineering. In 2011 international symposium on empirical software engineering and measurement. 275–284.
- Refactoring sequential Java code for concurrency via concurrent libraries. In 2009 IEEE 31st International Conference on Software Engineering.
- Cyclomatic complexity. IEEE software 33, 6 (2016), 27–29.
- Automated Repair of Programs from Large Language Models. In Proceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE ’23). IEEE Press, 1469–1481. https://doi.org/10.1109/ICSE48619.2023.00128
- Code smell detection: Towards a machine learning-based approach. In 2013 IEEE international conference on software maintenance.
- VulRepair: a T5-based automated software vulnerability repair. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 935–947.
- Reconciling manual and automatic refactoring. In 2012 34th International Conference on Software Engineering (ICSE). 211–221.
- Georgios Gousios and Diomidis Spinellis. 2012. GHTorrent: GitHub’s data from a firehose. In 2012 9th IEEE Working Conference on Mining Software Repositories (MSR). 12–21.
- Ppt: Pre-trained prompt tuning for few-shot learning. arXiv preprint arXiv:2109.04332 (2021).
- Crossing the gap from imperative to functional programming through refactoring. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 543–553.
- On the use of delta debugging to reduce recordings and facilitate debugging of web applications. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 333–344.
- Mark Harman and Sebastian Danicic. 1995. Using program slicing to simplify testing. Software Testing, Verification and Reliability (1995).
- Pyart: Python api recommendation in real-time. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1634–1645.
- Empirical evaluation of software maintainability based on a manually validated refactoring dataset. Information and Software Technology (2018).
- Effective program debloating via reinforcement learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.
- Renáta Hodován and Ákos Kiss. 2016. Modernizing hierarchical delta debugging. In Proceedings of the 7th International Workshop on Automating Test Case Design, Selection, and Evaluation. 31–37.
- Coarse hierarchical delta debugging. In 2017 IEEE international conference on software maintenance and evolution (ICSME).
- API method recommendation without worrying about the task-API knowledge gap. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 293–304.
- Nicholas Jalbert and Koushik Sen. 2010. A trace simplification technique for effective debugging of concurrent programs. In Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering. 57–66.
- Adam C Jensen and Betty HC Cheng. 2010. On the use of genetic programming for automated refactoring and the introduction of design patterns. In Proceedings of the 12th annual conference on Genetic and evolutionary computation. 1341–1348.
- Simplydroid: efficient event sequence simplification for Android application. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 297–307.
- Machine learning based recommendation of method names: How far are we. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).
- Cure: Code-aware neural machine translation for automatic program repair. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1161–1173.
- A code refactoring dataset and its assessment regarding software maintainability. In 2016 IEEE 23rd International conference on software analysis, Evolution, and Reengineering (SANER), Vol. 1. 599–603.
- Chitti Babu Karakati and Sethukarasi Thirumaaran. 2022. Software code refactoring based on deep neural network-based fitness function. Concurrency and Computation: Practice and Experience (2022), e7531.
- Studying test annotation maintenance in the wild. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 62–73.
- A field study of refactoring challenges and benefits. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. 1–11.
- Dealing with noise in defect prediction. In Proceedings of the 33rd International Conference on Software Engineering. 481–490.
- GenProg: A generic method for automatic software repair. Software Engineering, IEEE Transactions on 38, 1 (2012), 54–72.
- Learning to recommend method names with global context. In Proceedings of the 44th International Conference on Software Engineering.
- Deep learning based code smell detection. IEEE transactions on Software Engineering (2019).
- TBar: Revisiting template-based automated program repair. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 31–42.
- Gen Lu and Saumya Debray. 2012. Automatic simplification of obfuscated JavaScript code: A semantics-based approach. In 2012 IEEE Sixth International Conference on Software Security and Reliability. 31–40.
- Coconut: combining context-aware neural translation models using ensemble for program repair. In Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis. 101–114.
- Neil Madden. 2020. API security in action. Simon and Schuster.
- Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering 4 (1976), 308–320.
- Does automated refactoring obviate systematic editing?. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. 392–402.
- Ghassan Misherghi and Zhendong Su. 2006. HDD: hierarchical delta debugging. In Proceedings of the 28th international conference on Software engineering.
- An empirical validation of cognitive complexity as a measure of source code understandability. In Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). 1–12.
- Emerson Murphy-Hill and Andrew P Black. 2008. Breaking the barriers to successful refactoring: observations and tools for extract method. In Proceedings of the 30th international conference on Software engineering.
- Api misuse correction: A statistical approach. arXiv preprint arXiv:1908.06492 (2019).
- Jürg Nievergelt. 1965. On the automatic simplification of computer programs. Commun. ACM 8, 6 (1965), 366–370.
- The untold story of code refactoring customizations in practice. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE).
- Multi-Criteria Code Refactoring Using Search-Based Software Engineering: An Industrial Case Study. ACM Trans. Softw. Eng. Methodol. 25, 3, Article 23 (jun 2016), 53 pages. https://doi.org/10.1145/2932631
- Behind the intents: An in-depth empirical study on software refactoring in modern code review. In Proceedings of the 17th International Conference on Mining Software Repositories. 125–136.
- Method name recommendation based on source code metrics. Journal of Computer Languages (2023).
- A large empirical assessment of the role of data balancing in machine-learning-based code smell detection. Journal of Systems and Software (2020).
- How do i refactor this? An empirical study on refactoring trends and topics in Stack Overflow. Empirical Software Engineering 27, 1 (2022), 1–43.
- A simpler model of software readability. In Proceedings of the 8th working conference on mining software repositories.
- {{\{{RAZOR}}\}}: A framework for post-deployment software debloating. In 28th USENIX security symposium (USENIX Security 19).
- Syntax-guided program reduction for understanding neural code intelligence models. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. 70–79.
- Rack: Automatic api recommendation using crowdsourced knowledge. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. 349–359.
- Amit Rathee and Jitender K Chhabra. 2022. Metrics for reusability of java language components. Journal of King Saud University-Computer and Information Sciences 34, 8 (2022), 5533–5551.
- Codebleu: a method for automatic evaluation of code synthesis. arXiv preprint arXiv:2009.10297 (2020).
- Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015).
- Code smell detection by deep direct-learning and transfer-learning. Journal of Systems and Software (2021).
- Why we refactor? confessions of github contributors. In Proceedings of the 2016 24th acm sigsoft international symposium on foundations of software engineering.
- Genetic Programming for Shader Simplification. ACM Trans. Graph. 30, 6 (dec 2011), 1–12. https://doi.org/10.1145/2070781.2024186
- Perses: Syntax-guided program reduction. In Proceedings of the 40th International Conference on Software Engineering. 361–371.
- Autotransform: Automated code transformation to support modern code review process. In Proceedings of the 44th International Conference on Software Engineering.
- Ten years of JDeodorant: Lessons learned from the hunt for smells. In 2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER).
- RefactoringMiner 2.0. IEEE Transactions on Software Engineering (2020).
- Accurate and Efficient Refactoring Detection in Commit History. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE ’18). ACM, New York, NY, USA, 483–494. https://doi.org/10.1145/3180155.3180206
- Clone refactoring with lambda expressions. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 60–70.
- On learning meaningful code changes via neural machine translation. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).
- An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology (TOSEM) 28, 4 (2019), 1–29.
- No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 382–394.
- Probabilistic delta debugging. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.
- Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859 (2021).
- A metrics suite for measuring reusability of software components. In Proceedings. 5th International Workshop on enterprise networking and computing in healthcare industry (IEEE Cat. No. 03EX717). 211–223.
- Slicing-Based Techniques for Software Fault Localization. Handbook of Software Fault Localization: Foundations and Advances (2023), 135–200.
- Pushing the Limit of 1-Minimality of Language-Agnostic Program Reduction. Proceedings of the ACM on Programming Languages 7, OOPSLA1 (2023), 636–664.
- Andreas Zeller. 1999. Yesterday, My Program Worked. Today, It Does Not. Why? SIGSOFT Softw. Eng. Notes 24, 6 (Oct. 1999), 253–267.
- Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and Isolating Failure-Inducing Input. IEEE Trans. Softw. Eng. 28, 2 (Feb. 2002), 183–200. https://doi.org/10.1109/32.988498
- Coditt5: Pretraining for source code and natural language editing. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–12.
- Sai Zhang. 2013. Practical semantic test simplification. In 2013 35th International Conference on Software Engineering (ICSE). 1173–1176.
- Are code examples on an online q&a forum reliable? a study of api misuse on stack overflow. In Proceedings of the 40th international conference on software engineering. 886–896.
- A syntax-guided edit decoder for neural program repair. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 341–353.