MLPerf Inference Benchmark (1911.02549v2)

Published 6 Nov 2019 in cs.LG, cs.PF, and stat.ML

Abstract: Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.

Citations (450)

View on Semantic Scholar

Summary

The paper introduces the MLPerf Inference Benchmark, a standardized suite aggregating over 600 reproducible performance measurements from 14 organizations.
The paper employs scenario-based evaluations across five ML tasks using models like ResNet-50 and MobileNet to mimic varied real-world applications.
The paper’s flexible design with open and closed divisions encourages innovation while ensuring strict comparability in ML system assessments.

An Insightful Overview of the MLPerf Inference Benchmark Paper

The MLPerf Inference Benchmark paper delineates significant advancements in the benchmarking of ML inference systems, catering to an expanding landscape of ML applications. As models increasingly interface with a multitude of hardware and software configurations, the necessity for standardized performance evaluation becomes paramount. This paper introduces the MLPerf Inference suite, a collaborative endeavor involving more than 30 organizations, which seeks to establish representative and reproducible benchmarking methodologies.

The burgeoning ecosystem of ML inference demands spans applications from embedded devices to data centers, characterized by a wide performance and power consumption spectrum. A striking observation from the paper is the submission of over 600 reproducible inference-performance measurements by 14 organizations, underscoring MLPerf Inference's adaptability and robustness.

Key Contributions

Workload Diversity and Reproducibility: The paper presents five key ML tasks—image classification, object detection, and language translation—selected for their maturity and broad applicability. The benchmark suite is grounded in models like ResNet-50 and MobileNet, offering quantifiable and reproducible metrics for architectural comparisons.
Scenario-Based Evaluation: The benchmark accommodates distinct real-world application scenarios such as single-stream, multistream, server, and offline modes. These scenarios reflect diverse operational environments ranging from smartphones to data-center applications, ensuring a comprehensive performance assessment.
Robust Quality Metrics: The adoption of stringent quality targets (e.g., maintaining model quality within 1% of reference accuracy) is pivotal. This positioning fosters meaningful comparisons and legitimatizes performance gains among different ML systems.
Flexibility and Innovation in Benchmarking: The benchmark features open and closed divisions. The closed division aims for strict comparability of systems, while the open division encourages innovation by allowing broader model and scenario modifications, providing a lens into system capabilities beyond standard tasks.

Implications and Future Outlook

The paper's delineation of MLPerf Inference offers critical insights into both theoretical and practical examination of ML system capabilities. By establishing architecturally neutral standards, the benchmark provides an invaluable tool for academia and industry alike to push the limits of inference hardware and software.

In terms of future developments, the potential to integrate emerging ML models like Transformers for NLP or new applications such as real-time speech recognition could further bolster the benchmark's relevance. Continued expansion could facilitate even more granular understanding and optimization of ML inference in various domains.

Conclusion

The MLPerf Inference Benchmark paper embodies a rigorous, well-conceived approach to standardizing ML inference evaluations. It acknowledges the complexity of the current ML landscape while providing a structured foundation to assess and compare burgeoning ML technologies. Not only does it set a benchmark in the literal sense, but it also paves the way for enhanced innovations and research initiatives moving forward. As the field evolves, MLPerf Inference's adaptability will play a crucial role in guiding the trajectory of ML system development and deployment.

PDF Markdown

Related Papers

MLPerf Training Benchmark (2019)
MLPerf Tiny Benchmark (2021)
MLPerf Mobile Inference Benchmark (2020)
MLHarness: A Scalable Benchmarking System for MLCommons (2021)
Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical (2024)

Tweets

https://twitter.com/jtguibas/status/1846773286753177727

YouTube

Show All Videos