A Software Engineering Perspective on Engineering Machine Learning Systems: State of the Art and Challenges (2012.07919v3)

Published 14 Dec 2020 in cs.SE and cs.LG

Abstract: Context: Advancements in ML lead to a shift from the traditional view of software development, where algorithms are hard-coded by humans, to ML systems materialized through learning from data. Therefore, we need to revisit our ways of developing software systems and consider the particularities required by these new types of systems. Objective: The purpose of this study is to systematically identify, analyze, summarize, and synthesize the current state of software engineering (SE) research for engineering ML systems. Method: I performed a systematic literature review (SLR). I systematically selected a pool of 141 studies from SE venues and then conducted a quantitative and qualitative analysis using the data extracted from these studies. Results: The non-deterministic nature of ML systems complicates all SE aspects of engineering ML systems. Despite increasing interest from 2018 onwards, the results reveal that none of the SE aspects have a mature set of tools and techniques. Testing is by far the most popular area among researchers. Even for testing ML systems, engineers have only some tool prototypes and solution proposals with weak experimental proof. Many of the challenges of ML systems engineering were identified through surveys and interviews. Researchers should conduct experiments and case studies, ideally in industrial environments, to further understand these challenges and propose solutions. Conclusion: The results may benefit (1) practitioners in foreseeing the challenges of ML systems engineering; (2) researchers and academicians in identifying potential research questions; and (3) educators in designing or updating SE courses to cover ML systems engineering.

PDF Abstract

A Software Engineering Perspective on Engineering Machine Learning Systems: State of the Art and Challenges

The paper "A Software Engineering Perspective on Engineering Machine Learning Systems: State of the Art and Challenges" provides a comprehensive systematic literature review (SLR) of the current state and challenges of integrating ML systems within the broader software engineering (SE) discipline. This review is crucial as it highlights the transition from traditional software systems—where functionalities are coded manually—to systems that learn and adapt from data.

Methodology and Results

The author, through a rigorous systematic literature review, analyzed 141 studies selected from SE research venues. This included both quantitative and qualitative assessment methods to extrapolate current trends, tool developments, and the maturity of SE practices relating to ML system development. The analysis plainly indicates an increased research focus from 2018 onwards, but starkly highlights that none of the SE facets studied—such as requirements engineering, software design, and maintenance—show a fully mature set of practices or technologies capable of handling the nuances specific to ML systems.

Testing emerges as a predominant area in SE research for ML systems, yet the results are sobering; there remain significant gaps, with only prototype tools and concepts existing alongside scant experimental validation. Moreover, traditional SE practices encounter challenges due to the non-deterministic nature of ML models, which significantly impacts development life cycles, from requirements gathering to maintenance.

Challenges Identified

The review identifies numerous challenges that SE for ML systems entails:

Non-determinism: The unpredictable output of ML models complicates requirement specifications, testing, and quality assurance.
Testing and Validation: Current tools offer insufficient experimental proof, and a robust suite of testing methodologies is absent. Mismatches between expected and actual model output further exacerbate testing difficulties due to the ML model's inherent uncertainty.
Integration with Traditional SE Practices: The blend of ML systems into traditional SE processes sparks the need for new processes and frameworks to address the unique attributes of ML capabilities, such as handling large datasets and updating models.
Data Quality and Management: Effective feature selection, data preparation, and handling data drift over time remains a crucial bottleneck for effective ML system performance.

Implications and Future Directions

The implications of this review are manifold for both practitioners and researchers. For practitioners, understanding these challenges is essential for mitigating risks associated with adopting ML systems. For researchers and academia, the identified gaps signal areas ripe for innovation, particularly in developing robust SE practices that assimilate ML dynamics efficiently.

From a theoretical perspective, the bridging effort between traditional SE and ML posits a fertile ground for developing new frameworks that can handle dynamic requirements, continuous validation of models, and modular integration of ML components into existing software systems.

Looking ahead, the need for industrial case studies becomes imperative to validate proposed methodologies and tools in real-world settings. Collaborative efforts between academia and industry could foster a more enriched understanding and development of mature, standardized SE practices for ML systems.

Conclusion

The paper emphasizes the nascent stage of SE practices in effectively managing the engineering of ML systems. Despite a growing body of research, substantive advancements in tool and process maturity remain elusive. This SLR lays the groundwork for a robust academic dialogue and research agenda aiming at a holistic SE framework that is responsive to the peculiar demands introduced by ML systems, thereby aiding a streamlined integration into existing software systems.

PDF Markdown Bookmark Chat (Pro)

Authors (1)

Görkem Giray (6 papers)

Citations (109)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos