Beyond the technical challenges for deploying Machine Learning solutions in a software company

Published 8 Aug 2017 in cs.HC, cs.AI, cs.SE, and stat.ML | (1708.02363v1)

Abstract: Recently software development companies started to embrace Machine Learning (ML) techniques for introducing a series of advanced functionality in their products such as personalisation of the user experience, improved search, content recommendation and automation. The technical challenges for tackling these problems are heavily researched in literature. A less studied area is a pragmatic approach to the role of humans in a complex modern industrial environment where ML based systems are developed. Key stakeholders affect the system from inception and up to operation and maintenance. Product managers want to embed "smart" experiences for their users and drive the decisions on what should be built next; software engineers are challenged to build or utilise ML software tools that require skills that are well outside of their comfort zone; legal and risk departments may influence design choices and data access; operations teams are requested to maintain ML systems which are non-stationary in their nature and change behaviour over time; and finally ML practitioners should communicate with all these stakeholders to successfully build a reliable system. This paper discusses some of the challenges we faced in Atlassian as we started investing more in the ML space.

Abstract PDF Upgrade to Chat

Citations (11)

View on Semantic Scholar

Summary

The paper highlights the critical role of stakeholder collaboration in assessing ML project feasibility and resource allocation.
The execution phase emphasizes iterative system refinement, addressing scalability, performance, and data quality challenges.
The operation phase underscores continuous system monitoring and robust design to counter data disruptions and maintenance issues.

Deployment Challenges of Machine Learning Solutions in Software Companies

Machine learning (ML) has increasingly been adopted by software companies aiming to enhance their products with features such as personalized user experiences, improved search functionalities, content recommendations, and automation. However, while the technical aspects of ML deployment are well-researched, the integration of these systems within a complex industrial environment requires careful consideration of human and organizational factors that are often overlooked. This paper provides insights into the challenges faced by Atlassian in deploying ML solutions, highlighting stakeholder roles and the hurdles encountered throughout the ideation, execution, and operational phases.

Ideation Phase: Evaluating Impact and Feasibility

During the ideation phase, product teams assess the value and feasibility of potential ML applications. Key considerations include estimating user impact and evaluating technical feasibility. Product managers play a crucial role in determining whether to allocate resources for the proposed ML system, evaluating both the potential benefits and practicality. This involves estimating the user base that would benefit from the system and comparing its value to existing functionalities.

Technical feasibility is assessed by ML practitioners, who must identify appropriate optimization metrics, confirm data availability and infrastructure capabilities, and estimate expected model performance. Legal frameworks and privacy concerns further complicate feasibility assessments, with regulations like GDPR imposing constraints on data usage. Balancing these privacy concerns while designing effective ML systems remains an ongoing challenge.

Execution Phase: System Development

The execution phase primarily involves ML practitioners developing the system, often in collaboration with external service providers due to internal skill shortages. ML practitioners must navigate complexities beyond algorithm selection, including data preprocessing, feature engineering, and scalability.

Deploying prototypes to end users may reveal unforeseen issues, such as inadequate performance or scalability limitations. These challenges require iterative refinement and adaptation of the system to maintain robustness across diverse user interactions.

Operation Phase: System Maintenance

The operational phase necessitates continuous monitoring and maintenance of the ML system, accounting for the dynamic nature of data inputs. Stability of data sources is crucial; any disruptions can significantly impact system performance. Development of procedures to monitor performance metrics and implement failover strategies, such as feature flags, can help mitigate operational issues.

Organizational policies may dictate that the team responsible for system development also oversees its ongoing maintenance. This necessitates building stable systems with provisions for missing or noisy data to avoid continuous maintenance challenges. Proper design should address edge cases that, despite being considered rare, can frequently emerge in large-scale systems.

Conclusions

The successful deployment and operation of ML systems in software companies require synergistic collaboration among various stakeholders, including ML practitioners, product managers, data scientists, designers, and operations teams. The ML practitioner's role extends beyond model development to encompass decisions impacting scalability, privacy, and long-term system maintenance. Addressing these challenges necessitates an integrated approach where technical expertise is complemented by organizational understanding and strategic planning. The lessons learned from successful implementations can guide future efforts in optimizing ML deployments within industrial environments.

Markdown