Operationalizing Machine Learning: An Interview Study (2209.09125v1)

Published 16 Sep 2022 in cs.SE, cs.HC, and cs.LG

Abstract: Organizations rely on machine learning engineers (MLEs) to operationalize ML, i.e., deploy and maintain ML pipelines in production. The process of operationalizing ML, or MLOps, consists of a continual loop of (i) data collection and labeling, (ii) experimentation to improve ML performance, (iii) evaluation throughout a multi-staged deployment process, and (iv) monitoring of performance drops in production. When considered together, these responsibilities seem staggering -- how does anyone do MLOps, what are the unaddressed challenges, and what are the implications for tool builders? We conducted semi-structured ethnographic interviews with 18 MLEs working across many applications, including chatbots, autonomous vehicles, and finance. Our interviews expose three variables that govern success for a production ML deployment: Velocity, Validation, and Versioning. We summarize common practices for successful ML experimentation, deployment, and sustaining production performance. Finally, we discuss interviewees' pain points and anti-patterns, with implications for tool design.

Authors (4)

Shreya Shankar (19 papers)
Rolando Garcia (7 papers)
Joseph M. Hellerstein (48 papers)
Aditya G. Parameswaran (18 papers)

Citations (45)

View on Semantic Scholar

Summary

The paper introduces the 'Three Vs' framework—Velocity, Validation, and Versioning—as a key driver of effective ML operations.
It leverages semi-structured interviews with 18 MLEs to uncover practical workflows and iterative, data-focused experimentation in production environments.
The study highlights challenges such as alert fatigue, resource mismanagement, and the need for sophisticated tooling in dynamic ML deployment settings.

Operationalizing Machine Learning: An Interview Study

The paper "Operationalizing Machine Learning: An Interview Study" offers a comprehensive exploration of the practices and challenges inherent in the burgeoning field of Machine Learning Operations (MLOps). Conducted through semi-structured ethnographic interviews with 18 Machine Learning Engineers (MLEs) from various sectors, the paper seeks to delineate the core responsibilities of MLEs and the typical workflows they employ in operationalizing machine learning models.

Key Findings

The paper identifies the primary tasks MLEs are engaged with in the production ML lifecycle: data collection and labeling, experimentation, evaluation and deployment, and monitoring and response. Each of these tasks involves a complex interplay of tools and methodologies, highlighting the multifaceted roles of MLEs.

One of the paper’s central contributions is the articulation of the "Three Vs" of MLOps: Velocity, Validation, and Versioning, which are pivotal in determining the success of ML deployments. Velocity emphasizes rapid prototyping and iteration, Validation involves rigorous, early testing to identify and fix errors, and Versioning refers to the management of different model versions to ensure resilience and adaptability in production.

Experimental Nature and Best Practices

The paper underscores the experimental nature of ML engineering, suggesting that rapid iteration and collaboration with domain experts often catalyze successful ML experiments. The iterative process typically focuses more on data and feature engineering rather than model architecture changes, as data-driven approaches are found to yield quicker and often more impactful results. This emphasis on data over models suggests a pragmatic approach, aligning computational resources with problem-specific empirical understanding.

Model Evaluation and Deployment

Operationalizing model evaluation is an area where the paper provides particular insights. MLEs are reported to undertake significant efforts to ensure that validation datasets dynamically evolve alongside changes in data and business requirements. This evolution is essential to address both aggregate and subpopulation-level performance considerations, a crucial aspect given the rapid shifts often encountered in production environments. Moreover, the multi-stage deployment strategy, involving several layers of model validation from dev to canary stages, ensures higher confidence before complete rollouts.

Challenges and Anti-Patterns

A notable part of the paper addresses the challenges MLEs face, particularly the "long tail" of ML bugs, alert fatigue, and the tensions between development and production environments. The paper points to the need for more sophisticated tooling that can meaningfully differentiate between critical and non-critical alerts to avoid alert fatigue and ensure high reliability during production.

In the anti-pattern section, the paper warns against behaviors such as the excessive focus on using all available computational resources without strategic prioritization of experiments—referred to as "keeping GPUs warm." This behavior can lead to cognitive overload and detracts from focusing on high-value experimentation.

Implications and Future Directions

The findings of this paper have several theoretical and practical implications. They highlight the necessity for standardized processes and frameworks in MLOps to manage the intricate balance of Velocity, Validation, and Versioning effectively. Moreover, there is a clear call to action for the development of new tools that can reduce the friction associated with model monitoring and validation, especially within the ever-evolving data landscapes characteristic of production environments.

Future research could focus on developing methodologies and tools that reduce the onus on manual intervention in the monitoring and validation stages, thereby increasing robustness and efficiency. Additionally, given the challenges related to experienced-based "tribal knowledge" and undocumented processes, efforts could be directed towards creating automated documentation tools to preserve and disperse critical operational knowledge.

Overall, this paper provides a detailed landscape of MLOps practices, underlining both the foundational techniques that drive success and the nuanced challenges that companies face. It stands as a rigorous resource for researchers and practitioners alike, seeking to navigate and optimize the complex domain of machine learning in production.

Related Papers

Tweets

https://twitter.com/_cgustavo/status/1777333109752565823

https://twitter.com/omid_bazgir/status/1900860037599494169

https://twitter.com/osipenks/status/1777382539809755354

https://twitter.com/hasdid/status/1795164829331665360

https://twitter.com/knishimae0531/status/1777499210168091063

https://twitter.com/smellslikeml/status/1900941118860390452

YouTube

Show All Videos

HackerNews

Operationalizing Machine Learning: An Interview Study (2 points, 0 comments)
Operationalizing Machine Learning: An Interview Study (2 points, 0 comments)