Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unraveling Code Clone Dynamics in Deep Learning Frameworks (2404.17046v1)

Published 25 Apr 2024 in cs.SE and cs.AI

Abstract: Deep Learning (DL) frameworks play a critical role in advancing artificial intelligence, and their rapid growth underscores the need for a comprehensive understanding of software quality and maintainability. DL frameworks, like other systems, are prone to code clones. Code clones refer to identical or highly similar source code fragments within the same project or even across different projects. Code cloning can have positive and negative implications for software development, influencing maintenance, readability, and bug propagation. In this paper, we aim to address the knowledge gap concerning the evolutionary dimension of code clones in DL frameworks and the extent of code reuse across these frameworks. We empirically analyze code clones in nine popular DL frameworks, i.e., TensorFlow, Paddle, PyTorch, Aesara, Ray, MXNet, Keras, Jax and BentoML, to investigate (1) the characteristics of the long-term code cloning evolution over releases in each framework, (2) the short-term, i.e., within-release, code cloning patterns and their influence on the long-term trends, and (3) the file-level code clones within the DL frameworks. Our findings reveal that DL frameworks adopt four distinct cloning trends and that these trends present some common and distinct characteristics. For instance, bug-fixing activities persistently happen in clones irrespective of the clone evolutionary trend but occur more in the "Serpentine" trend. Moreover, the within release level investigation demonstrates that short-term code cloning practices impact long-term cloning trends. The cross-framework code clone investigation reveals the presence of functional and architectural adaptation file-level cross-framework code clones across the nine studied frameworks. We provide insights that foster robust clone practices and collaborative maintenance in the development of DL frameworks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. 2024. Github Ranking. https://github.com/EvanLi/Github-Ranking#moststars.
  2. Software Engineering for Machine Learning: A Case Study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 291–300. https://doi.org/10.1109/ICSE-SEIP.2019.00042
  3. How Clones are Maintained: An Empirical Study. In 11th European Conference on Software Maintenance and Reengineering (CSMR’07). 81–90. https://doi.org/10.1109/CSMR.2007.26
  4. An investigation of the fault-proneness of clone evolutionary patterns. Software Quality Journal 26 (2018), 1187–1222.
  5. Late propagation in software clones. In 2011 27th IEEE International Conference on Software Maintenance (ICSM). 273–282. https://doi.org/10.1109/ICSM.2011.6080794
  6. An empirical study of faults in late propagation clone genealogies. Journal of Software: Evolution and Process 25 (11 2013). https://doi.org/10.1002/smr.1597
  7. The Open-Closed Principle of Modern Machine Learning Frameworks. In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). 353–363.
  8. Donald J. Berndt and James Clifford. 1994. Using Dynamic Time Warping to Find Patterns in Time Series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (Seattle, WA) (AAAIWS’94). AAAI Press, 359–370.
  9. Toward Understanding Deep Learning Framework Bugs. ACM Trans. Softw. Eng. Methodol. 32, 6, Article 135 (sep 2023), 31 pages. https://doi.org/10.1145/3587155
  10. Using a Nearest-Neighbour, BERT-Based Approach for Scalable Clone Detection. In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME). 582–591. https://doi.org/10.1109/ICSME55016.2022.00080
  11. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd ed.). Routledge. https://doi.org/10.4324/9780203774441
  12. Scripted GUI Testing of Android Apps: A Study on Diffusion, Evolution and Fragility. In Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering (Toronto, Canada) (PROMISE). Association for Computing Machinery, New York, NY, USA, 22–32. https://doi.org/10.1145/3127005.3127008
  13. An Empirical Study of Fault Triggers in Deep Learning Frameworks. IEEE Transactions on Dependable and Secure Computing 20, 4 (2023), 2696–2712. https://doi.org/10.1109/TDSC.2022.3152239
  14. Ranking Code Clones to Support Maintenance Activities. Empirical Softw. Engg. 28, 3 (apr 2023), 38 pages. https://doi.org/10.1007/s10664-023-10292-0
  15. Y. Golubev and T. Bryksin. 2021. On the Nature of Code Cloning in Open-Source Java Projects. In 2021 IEEE 15th International Workshop on Software Clones (IWSC). IEEE Computer Society, Los Alamitos, CA, USA, 22–28. https://doi.org/10.1109/IWSC53727.2021.00010
  16. Florin Gorunescu. 2011. Data Mining: Concepts, Models and Techniques.
  17. James A. Hanley and Barbara J. McNeil. 1982. The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve. Radiology 143, 1 (1982), 29–36. https://doi.org/10.1148/radiology.143.1.7063747
  18. Correlation analysis between code clone metrics and project data on the same specification projects. In 2018 IEEE 12th International Workshop on Software Clones (IWSC). 37–43. https://doi.org/10.1109/IWSC.2018.8327317
  19. An Empirical Study on Bugs Inside PyTorch: A Replication Study. arXiv:2307.13777 [cs.SE]
  20. Taxonomy of Real Faults in Deep Learning Systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 1110–1121. https://doi.org/10.1145/3377811.3380395
  21. Katsuro Inoue. 2021. Introduction to Code Clone Analysis. Springer Singapore, Singapore, 3–27. https://doi.org/10.1007/978-981-16-1927-4_1
  22. A Comparative Study of Software Bugs in Micro-clones and Regular Code Clones. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 73–83. https://doi.org/10.1109/SANER.2019.8667993
  23. A Comparative Study of Software Bugs in Clone and Non-Clone Code. In International Conference on Software Engineering and Knowledge Engineering. https://api.semanticscholar.org/CorpusID:34980604
  24. A Comprehensive Study on Deep Learning Bug Characteristics (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 510–520. https://doi.org/10.1145/3338906.3338955
  25. Repairing Deep Neural Networks: Fix Patterns and Challenges. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Seoul, South Korea) (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 1135–1146. https://doi.org/10.1145/3377811.3380378
  26. Md Rakibul Islam and Minhaz F. Zibran. 2018. On the characteristics of buggy code clones: A code quality perspective. In 2018 IEEE 12th International Workshop on Software Clones (IWSC). 23–29. https://doi.org/10.1109/IWSC.2018.8327315
  27. The Scent of Deep Learning Code: An Empirical Study. In Proceedings of the 17th International Conference on Mining Software Repositories (Seoul, Republic of Korea) (MSR ’20). Association for Computing Machinery, New York, NY, USA, 420–430. https://doi.org/10.1145/3379597.3387479
  28. Clones in Deep Learning Code: What, Where, and Why? Empirical Softw. Engg. 27, 4 (jul 2022), 75 pages. https://doi.org/10.1007/s10664-021-10099-x
  29. The symptoms, causes, and repairs of bugs inside a deep learning library. Journal of Systems and Software 177 (2021), 110935. https://doi.org/10.1016/j.jss.2021.110935
  30. Context-Based Detection of Clone-Related Bugs (ESEC-FSE ’07). Association for Computing Machinery, New York, NY, USA, 55–64. https://doi.org/10.1145/1287624.1287634
  31. The Impact of Automated Feature Selection Techniques on the Interpretation of Defect Models. Empirical Softw. Engg. 25, 5 (sep 2020), 3590–3638. https://doi.org/10.1007/s10664-020-09848-1
  32. Cory Kapser and Michael W. Godfrey. 2006. ”Cloning Considered Harmful” Considered Harmful. In 2006 13th Working Conference on Reverse Engineering. 19–28. https://doi.org/10.1109/WCRE.2006.1
  33. Taghi M. Khoshgoftaar and Edward B. Allen. 1999. LOGISTIC REGRESSION MODELING OF SOFTWARE QUALITY. International Journal of Reliability, Quality and Safety Engineering 06 (1999), 303–317. https://api.semanticscholar.org/CorpusID:61246258
  34. Denchmark: A Bug Benchmark of Deep Learning-related Software. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). 540–544. https://doi.org/10.1109/MSR52588.2021.00070
  35. Max Langenkamp and Daniel N. Yue. 2022. How Open Source Machine Learning Software Shapes AI. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (Oxford, United Kingdom) (AIES ’22). Association for Computing Machinery, New York, NY, USA, 385–395. https://doi.org/10.1145/3514094.3534167
  36. CP-Miner: finding copy-paste and related bugs in large-scale software code. IEEE Transactions on Software Engineering 32, 3 (2006), 176–192. https://doi.org/10.1109/TSE.2006.28
  37. Understanding Bugs in Multi-Language Deep Learning Frameworks. arXiv:2303.02695 [cs.SE]
  38. AUC: A Better Measure than Accuracy in Comparing Learning Algorithms. In Canadian Conference on AI. https://api.semanticscholar.org/CorpusID:18614857
  39. An Exploratory Study on the Introduction and Removal of Different Types of Technical Debt in Deep Learning Frameworks. Empirical Softw. Engg. 26, 2 (mar 2021), 36 pages. https://doi.org/10.1007/s10664-020-09917-5
  40. Guoming Long and Tao Chen. 2022. On Reporting Performance and Accuracy Bugs for Deep Learning Frameworks: An Exploratory Study from GitHub (EASE ’22). Association for Computing Machinery, New York, NY, USA, 90–99. https://doi.org/10.1145/3530019.3530029
  41. Angela Lozano and Michel Wermelinger. 2008. Assessing the effect of clones on changeability. In 2008 IEEE International Conference on Software Maintenance. 227–236. https://doi.org/10.1109/ICSM.2008.4658071
  42. Angela Lozano and Michel Wermelinger. 2010. Tracking Clones’ Imprint. In Proceedings of the 4th International Workshop on Software Clones (Cape Town, South Africa) (IWSC ’10). Association for Computing Machinery, New York, NY, USA, 65–72. https://doi.org/10.1145/1808901.1808910
  43. An Empirical Study of the Impact of Modern Code Review Practices on Software Quality. Empirical Softw. Engg. 21, 5 (oct 2016), 2146–2189. https://doi.org/10.1007/s10664-015-9381-9
  44. Exploring the Impact of Code Clones on Deep Learning Software. ACM Trans. Softw. Eng. Methodol. 32, 6, Article 153 (sep 2023), 34 pages. https://doi.org/10.1145/3607181
  45. Mockus and Votta. 2000. Identifying reasons for software changes using historic databases. In Proceedings 2000 International Conference on Software Maintenance. 120–130. https://doi.org/10.1109/ICSM.2000.883028
  46. Is Cloned Code Really Stable? Empirical Softw. Engg. 23, 2 (apr 2018), 693–770. https://doi.org/10.1007/s10664-017-9528-y
  47. Investigating Context Adaptation Bugs in Code Clones. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). 157–168. https://doi.org/10.1109/ICSME.2019.00026
  48. Comparative Stability of Cloned and Non-cloned Code: An Empirical Study. Proceedings of the ACM Symposium on Applied Computing, 1227–1234. https://doi.org/10.1145/2245276.2231969
  49. Does cloned code increase maintenance effort?. In 2017 IEEE 11th International Workshop on Software Clones (IWSC). 1–7. https://doi.org/10.1109/IWSC.2017.7880507
  50. A fine-grained analysis on the inconsistent changes in code clones. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 220–231.
  51. A Summary on the Stability of Code Clones and Current Research Trends. Springer Singapore, Singapore, 169–180. https://doi.org/10.1007/978-981-16-1927-4_12
  52. Software quality analysis by code clones in industrial legacy software. In Proceedings Eighth IEEE Symposium on Software Metrics. 87–94. https://doi.org/10.1109/METRIC.2002.1011328
  53. Common Challenges of Deep Reinforcement Learning Applications Development: An Empirical Study. arXiv:2310.09575 [cs.SE]
  54. A Literature Review of Code Clone Analysis to Improve Software Maintenance Process. (05 2012).
  55. Md. Jubair Ibna Mostafa. 2019. An Empirical Study on Clone Evolution by Analyzing Clone Lifetime. In 2019 IEEE 13th International Workshop on Software Clones (IWSC). 20–26. https://doi.org/10.1109/IWSC.2019.8665850
  56. The Influence of Organizational Structure on Software Quality: An Empirical Case Study. In Proceedings of the 30th International Conference on Software Engineering (Leipzig, Germany) (ICSE ’08). Association for Computing Machinery, New York, NY, USA, 521–530. https://doi.org/10.1145/1368088.1368160
  57. An Empirical Study of Refactoring Rhythms and Tactics in the Software Development Process. IEEE Transactions on Software Engineering 49, 12 (2023), 5103–5119. https://doi.org/10.1109/TSE.2023.3326775
  58. P. Pinheiro. 2010. Linear and Nonlinear Mixed Effects Models: Theory and Applications. https://cran.r-project.org/web/packages/nlme/nlme.pdf
  59. Md Saidur Rahman and Chanchal K. Roy. 2017. On the Relationships Between Stability and Bug-Proneness of Code Clones: An Empirical Study. In 2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM). 131–140. https://doi.org/10.1109/SCAM.2017.26
  60. Clone Detection on Large Scala Codebases. In 2020 IEEE 14th International Workshop on Software Clones (IWSC). IEEE Computer Society, Los Alamitos, CA, USA, 38–44. https://doi.org/10.1109/IWSC50091.2020.9047640
  61. Sourcerer’s Apprentice and the study of code snippet migration. CoRR abs/1808.00106 (2018). arXiv:1808.00106 http://arxiv.org/abs/1808.00106
  62. Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20 (1987), 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
  63. Chanchal Roy and James Cordy. 2007. A Survey on Software Clone Detection Research. School of Computing TR 2007-541 (01 2007).
  64. Chanchal K. Roy and James R. Cordy. 2008. NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization. In 2008 16th IEEE International Conference on Program Comprehension. 172–181. https://doi.org/10.1109/ICPC.2008.41
  65. Chanchal K. Roy and James R. Cordy. 2009. A Mutation/Injection-Based Automatic Framework for Evaluating Code Clone Detection Tools. In 2009 International Conference on Software Testing, Verification, and Validation Workshops. 157–166. https://doi.org/10.1109/ICSTW.2009.18
  66. SourcererCC: Scaling Code Clone Detection to Big-Code. In Proceedings of the 38th International Conference on Software Engineering (Austin, Texas) (ICSE ’16). Association for Computing Machinery, New York, NY, USA, 1157–1168. https://doi.org/10.1145/2884781.2884877
  67. A comparative study of open source deep learning frameworks. In 2018 9th International Conference on Information and Communication Systems (ICICS). 72–77. https://doi.org/10.1109/IACS.2018.8355444
  68. Silent Bugs in Deep Learning Frameworks: An Empirical Study of Keras and TensorFlow. ArXiv abs/2112.13314 (2021). https://api.semanticscholar.org/CorpusID:245502137
  69. Will This Clone Be Short-Lived? Towards a Better Understanding of the Characteristics of Short-Lived Clones. Empirical Softw. Engg. 24, 2 (apr 2019), 937–972. https://doi.org/10.1007/s10664-018-9645-2
  70. Brent van Bladel and Serge Demeyer. 2023. A Comparative Study of Code Clone Genealogies in Test Code and Production Code. In 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 913–920. https://doi.org/10.1109/SANER56733.2023.00110
  71. On the Relationship of Inconsistent Software Clones and Faults: An Empirical Study. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. 79–89. https://doi.org/10.1109/SANER.2016.94
  72. How does Machine Learning Change Software Development Practices? IEEE Transactions on Software Engineering 47, 9 (2021), 1857–1871. https://doi.org/10.1109/TSE.2019.2937083
  73. Various Frameworks and Libraries of Machine Learning and Deep Learning: A Survey. Archives of Coputational Methods in Engineering (02 2019). https://doi.org/10.1007/s11831-018-09312-w
  74. Deep Learning Library Testing via Effective Model Generation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 788–799. https://doi.org/10.1145/3368089.3409761
  75. T. Warren Liao. 2005. Clustering of time series data—a survey. Pattern Recognition 38, 11 (2005), 1857–1874. https://doi.org/10.1016/j.patcog.2005.01.025
  76. S. Weisberg. 2005. Applied Linear Regression. Vol. 528. John Wiley & Sons.
  77. Edmund Wong. 2019. Improving Software Dependability through Documentation Analysis. (2019).
  78. An empirical study on release notes patterns of popular apps in the Google Play Store. Empirical Software Engineering 27 (2022), 1–38. https://api.semanticscholar.org/CorpusID:238863459
  79. Automatic Clone Recommendation for Refactoring Based on the Present and the Past. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). 115–126. https://doi.org/10.1109/ICSME.2018.00021
  80. A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges. arXiv:2306.16171 [cs.SE]
  81. An Empirical Study of Common Challenges in Developing Deep Learning Applications. In 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE). 104–115. https://doi.org/10.1109/ISSRE.2019.00020
  82. An Empirical Study on TensorFlow Program Bugs. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (Amsterdam, Netherlands) (ISSTA 2018). Association for Computing Machinery, New York, NY, USA, 129–140. https://doi.org/10.1145/3213846.3213866
  83. A Systematic Literature Review of Clone Evolution. In Proceedings of the 5th International Conference on Computer Science and Software Engineering (Guilin, China) (CSSE ’22). Association for Computing Machinery, New York, NY, USA, 461–473. https://doi.org/10.1145/3569966.3570091
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Maram Assi (3 papers)
  2. Safwat Hassan (9 papers)
  3. Ying Zou (23 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.