- The paper empirically assesses SE best practices tailored for ML, revealing significant correlations with improved agility, quality, and team effectiveness.
- It employs both linear regression and non-linear AutoML models to quantify how diverse practices impact key outcomes.
- Findings show that larger, experienced teams adopt more practices, yet challenges persist in testing and feature management.
Adoption and Effects of Software Engineering Best Practices in Machine Learning
Introduction and Background
The increasing integration of Machine Learning (ML) components in software development necessitates robust engineering practices to maintain high-quality standards. This paper outlines efforts to measure the adoption rate and effects of software engineering best practices specifically tailored for ML applications. Previous studies have identified various challenges unique to ML systems, such as data versioning, development scalability, and testing intricacies (2007.14130). This work aims to bridge gaps by offering empirical insights into these practices through a comprehensive survey targeted at ML practitioners.
Methodology
The research consists of a survey conducted among 313 practitioners identified through mining both academic and grey literature. A catalog of 29 best practices was established, categorized into Data, Training, Deployment, Coding, Team, and Governance. The practices range from traditional software engineering norms modified for ML contexts to entirely new practices designed specifically for ML challenges.
Survey Design
The survey designed for this study includes questions about the adoption of identified practices and their perceived effects on four key areas: agility, software quality, team effectiveness, and traceability. The process involved validation through pilot interviews and filtering based on completeness and demographic balancing.



Figure 1: Demographic information describing the survey participants. All plots show the percentage of respondents, grouped by various demographic factors.
Findings on Practice Adoption
Demographics
Analysis indicates a diverse representation in terms of geography, organizational types, and team sizes and experiences. Notably, regions such as Europe and North America show distinct adoption patterns, with North America achieving higher overall practice adoption rates.
Practice Adoption Ranking
Although new practices specifically designed for ML showed higher adoption rates, certain traditional practices, especially those concerning team collaboration (e.g., using collaborative development platforms), were widely embraced. Conversely, practices involving testing and explicit feature management exhibited lower adoption, highlighting persistent challenges in these areas.



Figure 2: Adoption of practices grouped by various demographic factors. All plots show the percentage of answers, grouped by the answer types illustrated in the plot legend.
Effects of Practices
Correlations Among Practices
The study identifies multiple statistically significant correlations between practices across various categories, providing insights into how certain practices support others. For instance, peer review practices show positive correlation with overall team collaboration, suggesting that adoption of specific practices enhances comprehensive process improvement.
Linear and Non-Linear Analysis
Using statistical models, the paper explores both linear and non-linear effects of practice adoption:
Discussion
The study provides empirical validation of challenges such as versioning and experiment management raised in earlier works while emphasizing the need for continued research in less-adopted practices like testing. It's observed that teams larger in size or holding more experience tend to have higher adoption rates, underscoring the importance of resources and expertise.
The use of Shapley values identifies key practices impacting effects, enabling informed decision-making for incremental improvement in ML processes.
Conclusion
The paper offers a quantitative basis for assessing and stepwise improvement of practice adoption among ML teams. It illustrates the importance of specific practices in achieving desired effects and serves as a foundation for future enhancements in ML engineering, with broader implications potentially extending into AI system development.
Future work involves expanding survey participation, refining practice validation, and extending applicability beyond strictly ML-focused systems to encompass broader AI contexts. The ongoing goal is to facilitate effective and sustainable integration of ML components within robust engineering frameworks.