Papers
Topics
Authors
Recent
2000 character limit reached

Adoption and Effects of Software Engineering Best Practices in Machine Learning

Published 28 Jul 2020 in cs.SE | (2007.14130v2)

Abstract: The increasing reliance on applications with ML components calls for mature engineering techniques that ensure these are built in a robust and future-proof manner. We aim to empirically determine the state of the art in how teams develop, deploy and maintain software with ML components. We mined both academic and grey literature and identified 29 engineering best practices for ML applications. We conducted a survey among 313 practitioners to determine the degree of adoption for these practices and to validate their perceived effects. Using the survey responses, we quantified practice adoption, differentiated along demographic characteristics, such as geography or team size. We also tested correlations and investigated linear and non-linear relationships between practices and their perceived effect using various statistical models. Our findings indicate, for example, that larger teams tend to adopt more practices, and that traditional software engineering practices tend to have lower adoption than ML specific practices. Also, the statistical models can accurately predict perceived effects such as agility, software quality and traceability, from the degree of adoption for specific sets of practices. Combining practice adoption rates with practice importance, as revealed by statistical models, we identify practices that are important but have low adoption, as well as practices that are widely adopted but are less important for the effects we studied. Overall, our survey and the analysis of responses received provide a quantitative basis for assessment and step-wise improvement of practice adoption by ML teams.

Citations (108)

Summary

  • The paper empirically assesses SE best practices tailored for ML, revealing significant correlations with improved agility, quality, and team effectiveness.
  • It employs both linear regression and non-linear AutoML models to quantify how diverse practices impact key outcomes.
  • Findings show that larger, experienced teams adopt more practices, yet challenges persist in testing and feature management.

Adoption and Effects of Software Engineering Best Practices in Machine Learning

Introduction and Background

The increasing integration of Machine Learning (ML) components in software development necessitates robust engineering practices to maintain high-quality standards. This paper outlines efforts to measure the adoption rate and effects of software engineering best practices specifically tailored for ML applications. Previous studies have identified various challenges unique to ML systems, such as data versioning, development scalability, and testing intricacies (2007.14130). This work aims to bridge gaps by offering empirical insights into these practices through a comprehensive survey targeted at ML practitioners.

Methodology

The research consists of a survey conducted among 313 practitioners identified through mining both academic and grey literature. A catalog of 29 best practices was established, categorized into Data, Training, Deployment, Coding, Team, and Governance. The practices range from traditional software engineering norms modified for ML contexts to entirely new practices designed specifically for ML challenges.

Survey Design

The survey designed for this study includes questions about the adoption of identified practices and their perceived effects on four key areas: agility, software quality, team effectiveness, and traceability. The process involved validation through pilot interviews and filtering based on completeness and demographic balancing. Figure 1

Figure 1

Figure 1

Figure 1

Figure 1: Demographic information describing the survey participants. All plots show the percentage of respondents, grouped by various demographic factors.

Findings on Practice Adoption

Demographics

Analysis indicates a diverse representation in terms of geography, organizational types, and team sizes and experiences. Notably, regions such as Europe and North America show distinct adoption patterns, with North America achieving higher overall practice adoption rates.

Practice Adoption Ranking

Although new practices specifically designed for ML showed higher adoption rates, certain traditional practices, especially those concerning team collaboration (e.g., using collaborative development platforms), were widely embraced. Conversely, practices involving testing and explicit feature management exhibited lower adoption, highlighting persistent challenges in these areas. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: Adoption of practices grouped by various demographic factors. All plots show the percentage of answers, grouped by the answer types illustrated in the plot legend.

Effects of Practices

Correlations Among Practices

The study identifies multiple statistically significant correlations between practices across various categories, providing insights into how certain practices support others. For instance, peer review practices show positive correlation with overall team collaboration, suggesting that adoption of specific practices enhances comprehensive process improvement.

Linear and Non-Linear Analysis

Using statistical models, the paper explores both linear and non-linear effects of practice adoption:

  • Linear regression models demonstrate statistical significance in describing effects such as agility, software quality, and traceability from the practices considered.
  • Non-linear models—specifically those optimized using AutoML—highlight complex relationships where certain practices disproportionately affect outcomes, emphasizing the non-linear nature of practice impacts. Figure 3

    Figure 3: Practice adoption and importance, for each effect and practice. The practice importance is the Shapley value extracted from the grid search RF models in Table~\ref{tbl:ml_regression

Discussion

The study provides empirical validation of challenges such as versioning and experiment management raised in earlier works while emphasizing the need for continued research in less-adopted practices like testing. It's observed that teams larger in size or holding more experience tend to have higher adoption rates, underscoring the importance of resources and expertise.

The use of Shapley values identifies key practices impacting effects, enabling informed decision-making for incremental improvement in ML processes.

Conclusion

The paper offers a quantitative basis for assessing and stepwise improvement of practice adoption among ML teams. It illustrates the importance of specific practices in achieving desired effects and serves as a foundation for future enhancements in ML engineering, with broader implications potentially extending into AI system development.

Future work involves expanding survey participation, refining practice validation, and extending applicability beyond strictly ML-focused systems to encompass broader AI contexts. The ongoing goal is to facilitate effective and sustainable integration of ML components within robust engineering frameworks.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.