Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects (2403.12199v3)
Abstract: The growing popularity of ML and the integration of ML components with other software artifacts has led to the use of continuous integration and delivery (CI/CD) tools, such as Travis CI, GitHub Actions, etc. that enable faster integration and testing for ML projects. Such CI/CD configurations and services require synchronization during the life cycle of the projects. Several works discussed how CI/CD configuration and services change during their usage in traditional software systems. However, there is very limited knowledge of how CI/CD configuration and services change in ML projects. To fill this knowledge gap, this work presents the first empirical analysis of how CI/CD configuration evolves for ML software systems. We manually analyzed 343 commits collected from 508 open-source ML projects to identify common CI/CD configuration change categories in ML projects and devised a taxonomy of 14 co-changes in CI/CD and ML components. Moreover, we developed a CI/CD configuration change clustering tool that identified frequent CI/CD configuration change patterns in 15,634 commits. Furthermore, we measured the expertise of ML developers who modify CI/CD configurations. Based on this analysis, we found that 61.8% of commits include a change to the build policy and minimal changes related to performance and maintainability compared to general open-source projects. Additionally, the co-evolution analysis identified that CI/CD configurations, in many cases, changed unnecessarily due to bad practices such as the direct inclusion of dependencies and a lack of usage of standardized testing frameworks. More practices were found through the change patterns analysis consisting of using deprecated settings and reliance on a generic build language. Finally, our developer's expertise analysis suggests that experienced developers are more inclined to modify CI/CD configurations.
- M. Hilton, T. Tunnell, K. Huang, D. Marinov, and D. Dig, “Usage, costs, and benefits of continuous integration in open-source projects,” in Proceedings of the 31st IEEE/ACM international conference on automated software engineering, 2016, pp. 426–437.
- M. Beller, G. Gousios, and A. Zaidman, “Oops, my tests broke the build: An explorative analysis of travis ci with github,” in 2017 IEEE/ACM 14th International conference on mining software repositories (MSR). IEEE, 2017, pp. 356–367.
- B. Vasilescu, Y. Yu, H. Wang, P. Devanbu, and V. Filkov, “Quality and productivity outcomes relating to continuous integration in github,” in Proceedings of the 2015 10th joint meeting on foundations of software engineering, 2015, pp. 805–816.
- M. Shahin, M. A. Babar, and L. Zhu, “Continuous integration, delivery and deployment: a systematic review on approaches, tools, challenges and practices,” IEEE access, vol. 5, pp. 3909–3943, 2017.
- D. Ståhl and J. Bosch, “Modeling continuous integration practice differences in industry software development,” Journal of Systems and Software, vol. 87, pp. 48–59, 2014.
- D. E. Rzig, F. Hassan, C. Bansal, and N. Nagappan, “Characterizing the usage of ci tools in ml projects,” in Proceedings of the 16th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2022, pp. 69–79.
- D. Gonzalez, T. Zimmermann, and N. Nagappan, “The state of the ml-universe: 10 years of artificial intelligence & machine learning software development on github,” in Proceedings of the 17th International conference on mining software repositories, 2020, pp. 431–442.
- Quantilus. (2020) Why is machine learning important and how will it impact business? Accessed: 2023-10-25. [Online]. Available: https://quantilus.com/why-is-machine-learning-important-andhow-will-it-impact-business/
- S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Nagappan, B. Nushi, and T. Zimmermann, “Software engineering for machine learning: A case study,” in 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2019, pp. 291–300.
- M. Haakman, L. Cruz, H. Huijgens, and A. van Deursen, “Ai lifecycle models need to be revised: An exploratory study in fintech,” Empirical Software Engineering, vol. 26, pp. 1–29, 2021.
- B. Karlaš, M. Interlandi, C. Renggli, W. Wu, C. Zhang, D. Mukunthu Iyappan Babu, J. Edwards, C. Lauren, A. Xu, and M. Weimer, “Building continuous integration services for machine learning,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 2407–2415.
- L. E. Lwakatare, I. Crnkovic, and J. Bosch, “Devops for ai–challenges in development of ai-enabled applications,” in 2020 international conference on software, telecommunications and computer networks (SoftCOM). IEEE, 2020, pp. 1–6.
- S. Martínez-Fernández, J. Bogner, X. Franch, M. Oriol, J. Siebert, A. Trendowicz, A. M. Vollmer, and S. Wagner, “Software engineering for ai-based systems: a survey,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 31, no. 2, pp. 1–59, 2022.
- F. Zampetti, S. Geremia, G. Bavota, and M. Di Penta, “Ci/cd pipelines evolution and restructuring: A qualitative and quantitative study,” in 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 2021, pp. 471–482.
- M. Golzadeh, A. Decan, and T. Mens, “On the rise and fall of ci services in github,” in 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2022, pp. 662–672.
- T. Kinsman, M. S. Wessel, M. A. Gerosa, and C. Treude, “How do software developers use github actions to automate their workflows?” 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), pp. 420–431, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:232320526
- D. G. Widder, M. C. Hilton, C. Kästner, and B. Vasilescu, “I’m leaving you, travis: A continuous integration breakup story,” 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR), pp. 165–169, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:50779400
- Anonymous. (2023) Replication package. [Online]. Available: https://figshare.com/s/7e0abe8a13f5306c1dfa
- D. E. Rzig, F. Hassan, and M. Kessentini, “An empirical study on ml devops adoption trends, efforts, and benefits analysis,” Information and Software Technology, vol. 152, p. 107037, 2022.
- KubeFlow. (2023) KubeFlow. https://kubeflow.org/. Accessed: 2023-11-07.
- Amazon Sagemaker. (2023) Amazon Sagemaker. https://aws.amazon.com/sagemaker/. Accessed: 2023-11-07.
- Microsoft Azure Machine Learning. (2023) Microsoft Azure Machine Learning. https://azure.microsoft.com/en-us/products/machine-learning. Accessed: 2023-11-09.
- T. CI. (2023) Travis CI - Test and Deploy Your Code with Confidence. https://www.travis-ci.com/. Accessed: 2023-11-07.
- E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, “The promises and perils of mining github,” in Proceedings of the 11th working conference on mining software repositories, 2014, pp. 92–101.
- S. Biswas, M. Wardat, and H. Rajan, “The art and practice of data science pipelines: A comprehensive study of data science pipelines in theory, in-the-small, and in-the-large,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 2091–2103.
- S. Dutta, A. Shi, R. Choudhary, Z. Zhang, A. Jain, and S. Misailovic, “Detecting flaky tests in probabilistic and machine learning applications,” in Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, 2020, pp. 211–224.
- M. L. McHugh, “Interrater reliability: the kappa statistic,” Biochemia medica, vol. 22, no. 3, pp. 276–282, 2012.
- S. A. Alvarez, “Chi-squared computation for association rules: preliminary results,” Boston, MA: Boston College, vol. 13, 2003.
- R. Agrawal, R. Srikant et al., “Fast algorithms for mining association rules,” in Proc. 20th int. conf. very large data bases, VLDB, vol. 1215. Santiago, Chile, 1994, pp. 487–499.
- X. Wang, H. Wang, and C. Liu, “Predicting co-changed software entities in the context of software evolution,” in 2009 International Conference on Information Engineering and Computer Science. IEEE, 2009, pp. 1–5.
- K. Herzig and N. Nagappan, “Empirically detecting false test alarms using association rules,” in 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol. 2. IEEE, 2015, pp. 39–48.
- C. Marsavina, D. Romano, and A. Zaidman, “Studying fine-grained co-evolution patterns of production and test code,” in 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation. IEEE, 2014, pp. 195–204.
- L. Vidács and M. Pinzger, “Co-evolution analysis of production and test code by learning association rules of changes,” in 2018 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE). IEEE, 2018, pp. 31–36.
- I. Neamtiu, J. S. Foster, and M. Hicks, “Understanding source code evolution using abstract syntax tree matching,” in Proceedings of the 2005 international workshop on Mining software repositories, 2005, pp. 1–5.
- J.-R. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus, “Fine-grained and accurate source code differencing,” in Proceedings of the 29th ACM/IEEE international conference on Automated software engineering, 2014, pp. 313–324.
- T. CI. (2023) Travis CI Build Config Reference. https://config.travis-ci.com/. Accessed: 2023-11-15.
- R. Robbes and D. Röthlisberger, “Using developer interaction data to compare expertise metrics,” in 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 2013, pp. 297–300.
- S. Baltes and S. Diehl, “Towards a theory of software development expertise,” in Proceedings of the 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering, 2018, pp. 187–200.
- T. Dey, A. Karnauch, and A. Mockus, “Representation of developer expertise in open source software,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 2021, pp. 995–1007.
- M. A. K. Azad, N. Iqbal, F. Hassan, and P. Roy, “An empirical study of high performance computing (hpc) performance bugs,” in 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). IEEE, 2023, pp. 194–206.
- J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?” in Proceedings of the 28th international conference on Software engineering, 2006, pp. 361–370.
- J. E. Montandon, L. L. Silva, and M. T. Valente, “Identifying experts in software libraries and frameworks among github users,” in 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 2019, pp. 276–287.
- D. W. McDonald and M. S. Ackerman, “Expertise recommender: a flexible recommendation system and architecture,” in Proceedings of the 2000 ACM conference on Computer supported cooperative work, 2000, pp. 231–240.
- A. Mockus and J. D. Herbsleb, “Expertise browser: a quantitative approach to identifying expertise,” in Proceedings of the 24th international conference on software engineering, 2002, pp. 503–512.
- T. Girba, A. Kuhn, M. Seeberger, and S. Ducasse, “How developers drive software evolution,” in Eighth international workshop on principles of software evolution (IWPSE’05). IEEE, 2005, pp. 113–122.
- J. Anvik and G. C. Murphy, “Determining implementation expertise from bug reports,” in Fourth International Workshop on Mining Software Repositories (MSR’07: ICSE Workshops 2007). IEEE, 2007, pp. 2–2.
- C. Spearman, “The proof and measurement of association between two things.” 1961.
- P. K. Sen, “Estimates of the regression coefficient based on kendall’s tau,” Journal of the American statistical association, vol. 63, no. 324, pp. 1379–1389, 1968.
- D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison, “Hidden technical debt in machine learning systems,” Advances in neural information processing systems, vol. 28, 2015.
- F. Zampetti, C. Vassallo, S. Panichella, G. Canfora, H. Gall, and M. Di Penta, “An empirical characterization of bad practices in continuous integration,” Empirical Software Engineering, vol. 25, pp. 1095–1135, 2020.
- Pytest. (2023) Pytest documentation. Accessed: 2023-11-17. [Online]. Available: https://docs.pytest.org/en/stable/
- Nose. (2023) Nose documentation. Accessed: 2023-11-17. [Online]. Available: https://nose.readthedocs.io/en/latest/
- Joblib. (2023) Joblib documentation. Accessed: 2023-11-17. [Online]. Available: https://joblib.readthedocs.io/en/latest/
- G. A. Lewis, I. Ozkaya, and X. Xu, “Software architecture challenges for ml systems,” in 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 2021, pp. 634–638.
- A. Barrak, E. E. Eghan, and B. Adams, “On the co-evolution of ml pipelines and source code-empirical study of dvc projects,” in 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2021, pp. 422–433.
- A. W. Services. (2023) Cloud computing services - amazon web services (aws). https://aws.amazon.com. Accessed: 2023-11-15.
- T. Shaffer, K. Chard, and D. Thain, “An empirical study of package dependencies and lifetimes in binder python containers,” in 2021 IEEE 17th International Conference on eScience (eScience). IEEE, 2021, pp. 215–224.
- S. Minto and G. C. Murphy, “Recommending emergent teams,” in Fourth International Workshop on Mining Software Repositories (MSR’07: ICSE Workshops 2007). IEEE, 2007, pp. 5–5.
- P. M. Duvall, “Continuous delivery: Patterns and antipatterns in the software life cycle,” DZone refcard, vol. 145, 2011.
- M. Hilton, N. Nelson, T. Tunnell, D. Marinov, and D. Dig, “Trade-offs in continuous integration: assurance, security, and flexibility,” in Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017, pp. 197–207.
- H. H. Olsson, H. Alahyari, and J. Bosch, “Climbing the" stairway to heaven"–a mulitiple-case study exploring barriers in the transition from agile development towards continuous deployment of software,” in 2012 38th euromicro conference on software engineering and advanced applications. IEEE, 2012, pp. 392–399.
- C. Renggli, B. Karlaš, B. Ding, F. Liu, K. Schawinski, W. Wu, and C. Zhang, “Continuous integration of machine learning models with ease. ml/ci: Towards a rigorous yet practical treatment,” Proceedings of Machine Learning and Systems, vol. 1, pp. 322–333, 2019.
- S. McIntosh, B. Adams, T. H. Nguyen, Y. Kamei, and A. E. Hassan, “An empirical study of build maintenance effort,” in Proceedings of the 33rd international conference on software engineering, 2011, pp. 141–150.
- Y. Jiang and B. Adams, “Co-evolution of infrastructure and source code-an empirical study,” in 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 2015, pp. 45–55.