Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects (2403.12199v3)

Published 18 Mar 2024 in cs.SE

Abstract: The growing popularity of ML and the integration of ML components with other software artifacts has led to the use of continuous integration and delivery (CI/CD) tools, such as Travis CI, GitHub Actions, etc. that enable faster integration and testing for ML projects. Such CI/CD configurations and services require synchronization during the life cycle of the projects. Several works discussed how CI/CD configuration and services change during their usage in traditional software systems. However, there is very limited knowledge of how CI/CD configuration and services change in ML projects. To fill this knowledge gap, this work presents the first empirical analysis of how CI/CD configuration evolves for ML software systems. We manually analyzed 343 commits collected from 508 open-source ML projects to identify common CI/CD configuration change categories in ML projects and devised a taxonomy of 14 co-changes in CI/CD and ML components. Moreover, we developed a CI/CD configuration change clustering tool that identified frequent CI/CD configuration change patterns in 15,634 commits. Furthermore, we measured the expertise of ML developers who modify CI/CD configurations. Based on this analysis, we found that 61.8% of commits include a change to the build policy and minimal changes related to performance and maintainability compared to general open-source projects. Additionally, the co-evolution analysis identified that CI/CD configurations, in many cases, changed unnecessarily due to bad practices such as the direct inclusion of dependencies and a lack of usage of standardized testing frameworks. More practices were found through the change patterns analysis consisting of using deprecated settings and reliance on a generic build language. Finally, our developer's expertise analysis suggests that experienced developers are more inclined to modify CI/CD configurations.

References (64)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/ComputerPapers/status/1777268531232481372

https://twitter.com/ComputerPapers/status/1770350569707692511

https://twitter.com/ComputerPapers/status/1771123538218389887

Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects (2403.12199v3)

Summary

Related Papers

Tweets