Machine Learning Operations (MLOps): Overview, Definition, and Architecture (2205.02302v3)

Published 4 May 2022 in cs.LG

Abstract: The final goal of all industrial ML projects is to develop ML products and rapidly bring them into production. However, it is highly challenging to automate and operationalize ML products and thus many ML endeavors fail to deliver on their expectations. The paradigm of Machine Learning Operations (MLOps) addresses this issue. MLOps includes several aspects, such as best practices, sets of concepts, and development culture. However, MLOps is still a vague term and its consequences for researchers and professionals are ambiguous. To address this gap, we conduct mixed-method research, including a literature review, a tool review, and expert interviews. As a result of these investigations, we provide an aggregated overview of the necessary principles, components, and roles, as well as the associated architecture and workflows. Furthermore, we furnish a definition of MLOps and highlight open challenges in the field. Finally, this work provides guidance for ML researchers and practitioners who want to automate and operate their ML products with a designated set of technologies.

Authors (3)

Dominik Kreuzberger (2 papers)
Niklas Kühl (94 papers)
Sebastian Hirschl (2 papers)

Citations (273)

View on Semantic Scholar

Summary

The paper introduces a unified MLOps framework by consolidating key principles from literature reviews, tool evaluations, and expert interviews.
It defines an end-to-end architecture that integrates CI/CD automation, workflow orchestration, reproducibility, and role-based collaborations for robust ML deployment.
It highlights challenges in scaling ML operations and advocates for a cultural shift toward product-centric, multidisciplinary practices.

An Academic Examination of Machine Learning Operations (MLOps): Overview, Definition, and Architecture

This paper, authored by Kreuzberger et al., offers a comprehensive exploration into the paradigm of Machine Learning Operations (MLOps), a nascent but burgeoning engineering practice designed to address challenges encountered during the deployment and maintenance of ML products in industrial settings. The authors elucidate the definition, architecture, and components necessary for successful MLOps implementation, leveraging a holistic approach that incorporates findings from literature reviews, tool evaluations, and expert interviews.

The primary objective of the paper is to distill MLOps from the complex milieu of ML, devOps, and data engineering, and to showcase it as a cohesive framework integral to the productionization of ML systems. This is especially significant given the high failure rate of ML projects in advancing from conceptualization to operational production, a barrier MLOps aims to overcome.

Methodology and Research Framework

To conjoin academic theory with practical insight, the research employs a mixed-method approach. The authors conducted a systematic literature review, a tool review capturing both open-source and commercial MLOps solutions, and semi-structured interviews with industry practitioners to gather empirical insights into MLOps challenges and practices.

The paper explores nine principles deemed essential for MLOps effectiveness: CI/CD automation, workflow orchestration, reproducibility, versioning, collaboration, continuous ML training and evaluation, ML metadata tracking/logging, continuous monitoring, and the establishment of feedback loops. Each principle is linked to specific technical components that actualize these concepts within an MLOps framework, such as CI/CD systems, source code repositories, workflow orchestration tools, and feature stores, among others.

Architectural and Role-Based Insights

A notable contribution of this paper is the detailed outline for an end-to-end architecture for MLOps deployment. This architecture serves as a blueprint for transitioning ML prototypes to robust operational systems, emphasizing the need for integration across different technical roles, including data scientists, data engineers, DevOps engineers, and MLOps engineers. The distribution of responsibilities across these roles underscores MLOps' nature as a collaborative paradigm that necessitates interdisciplinary cooperation.

Furthermore, by delineating specific workflows and interactions necessary for successful MLOps adoption, the authors provide an exhaustive operational procedure that can be adapted by organizations to fit their particular context and resources, thus fostering increased consistency and reliability in ML system deployment.

Challenges and Future Directions

The paper identifies several impediments to widespread MLOps adoption, framing them as organizational, ML system-related, and operational in nature. These challenges include a predominant focus on model-centric rather than product-centric ML development, a deficiency in skilled professionals equipped to handle MLOps-specific roles, and the complexities inherent in maintaining infrastructure capable of reliably supporting continuous model updates.

The authors argue for a cultural shift towards embracing data and product-centered AI developments, encouraging the training of ML professionals in areas beyond model building to encompass the full spectrum of skills required to engineer and sustain production-ready ML systems.

Conclusion

This work represents a substantive step towards unifying diverse practices under the MLOps domain to streamline the automation and operationalization of ML systems. While capturing the current landscape, the paper also sets a precedent for advancing MLOps research and practice, aiming to facilitate the seamless transition of conceptual ML models into production, thereby enhancing the robustness and accuracy of ML deployments in real-world applications. This detailed exploration of MLOps can serve as a cornerstone for future research and an operational guide for practitioners seeking to advance ML deployments within their organizations.

PDF Markdown

Related Papers

Tweets

https://twitter.com/bybysker/status/1880370801507246461

YouTube

Show All Videos

HackerNews

Machine Learning Operations (MLOps): Overview, Definition, and Architecture (1 point, 0 comments)