Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process (2110.10234v4)

Published 19 Oct 2021 in cs.SE and cs.LG

Abstract: The introduction of ML components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces additional challenges with its exploratory model development process, additional skills and knowledge needed, difficulties testing ML systems, need for continuous evolution and monitoring, and non-traditional quality requirements such as fairness and explainability. Through interviews with 45 practitioners from 28 organizations, we identified key collaboration challenges that teams face when building and deploying ML systems into production. We report on common collaboration points in the development of production ML systems for requirements, data, and integration, as well as corresponding team patterns and challenges. We find that most of these challenges center around communication, documentation, engineering, and process and collect recommendations to address these challenges.

Collaboration Challenges in Building ML-Enabled Systems: A Detailed Examination

The academic paper titled "Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process" presents an investigation into the collaborative dynamics inherent in the development and deployment of ML systems. This research is framed within the context of evolving software engineering practices, particularly focusing on the interplay between data scientists and software engineers. The insights are derived from interviews with 45 practitioners from 28 diverse organizations, highlighting critical collaboration challenges and proposing strategies to address them.

Key Findings

The paper identifies three predominant collaboration points linked to specific challenges: requirements gathering and system planning, training data management, and integration of ML components with overall system architecture.

  1. Requirements and Planning: The research delineates a dichotomy in development trajectories, namely model-first and product-first approaches, which significantly influence the collection of requirements and subsequent planning. While model-first strategies focus initially on ML model capabilities, determining product features around those capabilities can sometimes result in sub-optimal product design. Conversely, product-first trajectories derive model requirements from broader system goals, yet often face setbacks due to unrealistic expectations or lack of ML literacy among system designers.
  2. Training Data: The provision and management of training data stand out as a central challenge. Organizations typically structure data provisioning in three ways: provided data, external data, and in-house data, each posing unique collaboration dynamics. Practitioners often struggle with data quality and availability issues, inadequate data documentation, and the challenge of evolving data requirements. This point is compounded by difficulties in articulating data quality expectations and accessing domain expertise crucial for data interpretation and cleaning.
  3. Integration and Deployment: The integration phase, crucial for system functionality, is often fraught with difficulties stemming from unclear team responsibilities and disparate engineering capabilities. Responsibilities for model deployment and operations often do not align with team expertise, exacerbating integration problems, particularly when data scientists are sequestered from the larger engineering processes. The paper also notes the challenges in maintaining alignment in code quality, documentation standards, and the versioning of models and data.

Implications and Recommendations

The paper underscores the need for structured interdisciplinary collaboration to mitigate these challenges. It advocates for enhancing communication channels across roles, fostering mutual understanding of responsibilities, and promoting comprehensive documentation practices. The complexity introduced by ML components demands deliberate and informed project planning that anticipates potential misalignments and incorporates mechanisms for ongoing model and system evaluation.

In practical terms, the paper suggests integrating ML training into broader organizational training programs to support cross-disciplinary literacy and employing established documentation frameworks like model cards or FactSheets to facilitate clearer communication of model characteristics and performance expectations. Establishing formalized processes around data and model management, such as contracts specifying data quality and quantity expectations, can also benefit team alignment.

Future Directions

This research sets the stage for further exploration into integrated process life cycles that cohesively merge software engineering and data science disciplines, potentially adopting hybrid methodologies that draw from both traditional and emerging frameworks in AI engineering. Additionally, increasing focus on developing tools and practices that enhance the visibility and transparency of ML processes may yield new insights into optimal collaboration structures.

In summary, the paper provides a comprehensive look at the collaborative challenges in ML-enabled system development and offers actionable insights for improving cross-functional collaboration, ultimately contributing to more robust and reliable ML system delivery.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Nadia Nahar (9 papers)
  2. Shurui Zhou (12 papers)
  3. Grace Lewis (5 papers)
  4. Christian Kästner (43 papers)
Citations (109)
Youtube Logo Streamline Icon: https://streamlinehq.com