Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Demystifying the MLPerf Benchmark Suite (1908.09207v1)

Published 24 Aug 2019 in cs.LG and stat.ML

Abstract: MLPerf, an emerging machine learning benchmark suite strives to cover a broad range of applications of machine learning. We present a study on its characteristics and how the MLPerf benchmarks differ from some of the previous deep learning benchmarks like DAWNBench and DeepBench. We find that application benchmarks such as MLPerf (although rich in kernels) exhibit different features compared to kernel benchmarks such as DeepBench. MLPerf benchmark suite contains a diverse set of models which allows unveiling various bottlenecks in the system. Based on our findings, dedicated low latency interconnect between GPUs in multi-GPU systems is required for optimal distributed deep learning training. We also observe variation in scaling efficiency across the MLPerf models. The variation exhibited by the different models highlight the importance of smart scheduling strategies for multi-GPU training. Another observation is that CPU utilization increases with increase in number of GPUs used for training. Corroborating prior work we also observe and quantify improvements possible by compiler optimizations, mixed-precision training and use of Tensor Cores.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Snehil Verma (1 paper)
  2. Qinzhe Wu (3 papers)
  3. Bagus Hanindhito (4 papers)
  4. Gunjan Jha (1 paper)
  5. Eugene B. John (4 papers)
  6. Ramesh Radhakrishnan (3 papers)
  7. Lizy K. John (15 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.