Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines (2111.04131v2)

Published 7 Nov 2021 in cs.LG and cs.PF

Abstract: Input pipelines, which ingest and transform input data, are an essential part of training Machine Learning (ML) models. However, it is challenging to implement efficient input pipelines, as it requires reasoning about parallelism, asynchrony, and variability in fine-grained profiling information. Our analysis of over two million ML jobs in Google datacenters reveals that a significant fraction of model training jobs could benefit from faster input data pipelines. At the same time, our analysis indicates that most jobs do not saturate host hardware, pointing in the direction of software-based bottlenecks. Motivated by these findings, we propose Plumber, a tool for finding bottlenecks in ML input pipelines. Plumber uses an extensible and interpretable operational analysis analytical model to automatically tune parallelism, prefetching, and caching under host resource constraints. Across five representative ML pipelines, Plumber obtains speedups of up to 47x for misconfigured pipelines. By automating caching, Plumber obtains end-to-end speedups of over 50% compared to state-of-the-art tuners.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Michael Kuchnik (8 papers)
  2. Ana Klimovic (24 papers)
  3. Jiri Simsa (6 papers)
  4. Virginia Smith (68 papers)
  5. George Amvrosiadis (4 papers)
Citations (25)

Summary

We haven't generated a summary for this paper yet.