Who Needs MLOps: What Data Scientists Seek to Accomplish and How Can MLOps Help? (2103.08942v1)

Published 16 Mar 2021 in cs.SE

Abstract: Following continuous software engineering practices, there has been an increasing interest in rapid deployment of ML features, called MLOps. In this paper, we study the importance of MLOps in the context of data scientists' daily activities, based on a survey where we collected responses from 331 professionals from 63 different countries in ML domain, indicating on what they were working on in the last three months. Based on the results, up to 40% respondents say that they work with both models and infrastructure; the majority of the work revolves around relational and time series data; and the largest categories of problems to be solved are predictive analysis, time series data, and computer vision. The biggest perceived problems revolve around data, although there is some awareness of problems related to deploying models to production and related procedures. To hypothesise, we believe that organisations represented in the survey can be divided to three categories -- (i) figuring out how to best use data; (ii) focusing on building the first models and getting them to production; and (iii) managing several models, their versions and training datasets, as well as retraining and frequent deployment of retrained models. In the results, the majority of respondents are in category (i) or (ii), focusing on data and models; however the benefits of MLOps only emerge in category (iii) when there is a need for frequent retraining and redeployment. Hence, setting up an MLOps pipeline is a natural step to take, when an organization takes the step from ML as a proof-of-concept to ML as a part of nominal activities.

Authors (4)

Sasu Mäkinen (1 paper)
Henrik Skogström (1 paper)
Eero Laaksonen (2 papers)
Tommi Mikkonen (60 papers)

Citations (134)

View on Semantic Scholar

Summary

The Role and Importance of MLOps in Modern Machine Learning Practices

The paper "Who Needs MLOps: What Data Scientists Seek to Accomplish and How Can MLOps Help?" presents a comprehensive analysis of the current landscape of ML operations, commonly referred to as MLOps. By leveraging responses from a survey of 331 ML professionals across 63 countries, the authors aim to elucidate the critical role of MLOps in the day-to-day activities of data scientists and to categorize the maturity levels of organizational ML practices.

Survey Insights and Primary Findings

From the survey, several key insights emerge regarding the nature of current ML projects and challenges faced by professionals in the domain:

Scope of Responsibilities: Approximately 40% of respondents are engaged in both model development and infrastructure tasks. This suggests a substantial overlap in roles within organizations where individuals are often required to manage both ML models and the associated deployment infrastructure.
Data Types and Problem Domains: The majority of work revolves around relational and time-series data, with prevalent problem-solving areas being predictive analysis, time series forecasting, and computer vision tasks.
Challenges: The primary obstacles identified by survey respondents are related to data management, rather than model deployment. Key challenges include messy data, lack of data accessibility, and insufficient data quantity. There is also recognition of the difficulties in deploying models, yet this is a secondary concern compared to data issues.
Short-term Goals: Common objectives for the upcoming three months among respondents include developing models for production, deploying these models, optimizing existing models, and validating the potential of ML applications within their operational contexts.

Categorization by ML Maturity

The authors propose a delineation of organizations into three maturity categories, each necessitating specific support from MLOps frameworks:

Data-centric: Organizations grappling with data management and exploitation.
Model-centric: Entities focusing on building initial models and the logistics of deployment.
Pipeline-centric: Companies that have business-critical models in production, working toward scalability, continuous deployment, and model performance maintenance.

The survey indicates that most organizations fall within the data-centric or model-centric categories, illustrating that while the concept of MLOps is advancing, its full adoption is currently limited to a minority of sophisticated pipeline-centric organizations.

Implications and Future Directions

The paper underscores the incremental adoption of MLOps practices analogous to the adoption of DevOps in conventional software engineering. MLOps embodies a continuous delivery framework tailored to the iterative and data-centric nature of ML applications. By incorporating practices such as automated data preparation, model experimentation, and deployment, MLOps facilitates seamless integration of ML components into operational workflows.

For the ML community, the findings highlight the necessity for organizations to evolve from isolated data and model considerations to a more integrated pipeline-centric approach as their ML operations mature. This evolution demands strategic commitment, specialized infrastructure, and a cross-disciplinary approach combining data science, engineering, and infrastructure management.

The paper's insights suggest several pathways for future development:

MLOps Tooling: Enhanced tooling that addresses data accessibility, model deployment, and monitoring will be critical to support organizations transitioning to more mature ML operations.
Cross-disciplinary Training: Expanding the skill sets of ML professionals to encompass infrastructure management and deployment will be essential.
Standardization of Practices: There is a potential for formulating standardized MLOps practices across industries to ensure consistency and efficiency in ML deployments.

In conclusion, while MLOps offers significant potential benefits to ML operations, its adoption is still in nascent stages. Organizations must strategically invest in infrastructure and skills development to fully leverage the benefits of MLOps and maintain competitiveness in the rapidly evolving AI landscape.

PDF Markdown

Related Papers

YouTube

Show All Videos