Quality Issues in Machine Learning Software Systems (2306.15007v2)
Abstract: Context: An increasing demand is observed in various domains to employ Machine Learning (ML) for solving complex problems. ML models are implemented as software components and deployed in Machine Learning Software Systems (MLSSs). Problem: There is a strong need for ensuring the serving quality of MLSSs. False or poor decisions of such systems can lead to malfunction of other systems, significant financial losses, or even threats to human life. The quality assurance of MLSSs is considered a challenging task and currently is a hot research topic. Objective: This paper aims to investigate the characteristics of real quality issues in MLSSs from the viewpoint of practitioners. This empirical study aims to identify a catalog of quality issues in MLSSs. Method: We conduct a set of interviews with practitioners/experts, to gather insights about their experience and practices when dealing with quality issues. We validate the identified quality issues via a survey with ML practitioners. Results: Based on the content of 37 interviews, we identified 18 recurring quality issues and 21 strategies to mitigate them. For each identified issue, we describe the causes and consequences according to the practitioners' experience. Conclusion: We believe the catalog of issues developed in this study will allow the community to develop efficient quality assurance tools for ML models and MLSSs. A replication package of our study is available on our public GitHub repository
- (2022) Tesla behind eight-vehicle crash was in full self-driving mode, says driver. URL \url{https://www.theguardian.com/technology/2022/dec/22/tesla-crash-full-self-driving-mode-san-francisco}
- Black A, van Nederpelt P (2020) Dimensions of data quality (ddq). URL https://www.dama-nl.org/wp-content/uploads/2020/09/DDQ-Dimensions-of-Data-Quality-Research-Paper-version-1.2-d.d.-3-Sept-2020.pdf
- Blais O (2020) Validate and monitor your machine learning models. URL https://github.com/moovai/model_validation_tutorial/blob/master/ODSC_East_2020_Validation_Monitoring_Training.pdf
- Dastin J (2018) Amazon scraps secret ai recruiting tool that showed bias against women. URL https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK08G
- Hudgeon D, Nichol R (2020) Machine learning for business: Using amazon sagemaker and jupyter. URL https://aws.amazon.com/sagemaker/data-wrangler/
- Krisher T (2022) Us report: Nearly 400 crashes of automated tech vehicles. URL https://apnews.com/article/self-driving-car-crash-data-ae87cadec79966a9ba56e99b4110b8d6
- Labbe M (2021) Energy consumption of ai poses environmental problems: Techtarget. URL https://www.techtarget.com/searchenterpriseai/feature/Energy-consumption-of-AI-poses-environmental-problems
- Luther D (2022) What are business metrics? 35 metrics businesses need to track. URL https://www.netsuite.com/portal/resource/articles/business-strategy/business-metrics.shtml#:~:text=Business%20metrics%20are%20quantifiable%20measures,businesses%2C%20with%20many%20different%20processes.
- Mailach A, Siegmund N (2023) Socio-technical anti-patterns in building ml-enabled software. Tech. rep.
- Martinez E, Kirchner L (2021) The secret bias hidden in mortgage-approval algorithms – the markup. URL https://themarkup.org/denied/2021/08/25/the-secret-bias-hidden-in-mortgage-approval-algorithms
- Oxford Languages (2023) URL https://languages.oup.com/
- Rudin C (2018) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. arXiv preprint arXiv:181110154 DOI 10.48550/ARXIV.1811.10154, URL https://arxiv.org/abs/1811.10154
- Sambasivan N, Kapania S, Highfill H, Akrong D, Paritosh P, Aroyo LM (2021) “everyone wants to do the model work, not the data work”: Data cascades in high-stakes ai. In: proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp 1–15
- Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Transactions on software engineering 25(4):557–572
- Serban A, Visser J (2021) An empirical study of software architecture for machine learning. arXiv preprint arXiv:210512422
- Strauss A, Corbin J (1994) Grounded theory methodology: An overview.
- Tannor P (2023) Data drift vs. concept drift. URL https://deepchecks.com/data-drift-vs-concept-drift-what-are-the-main-differences/