BENCHIP: Benchmarking Intelligence Processors (1710.08315v2)

Published 23 Oct 2017 in cs.PF and cs.AI

Abstract: The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BENCHIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BENCHIP consists of two sets of benchmarks: microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks. They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect the various characteristics of the evaluated intelligence processors. BENCHIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BENCHIP will be open-sourced soon.

Citations (34)

View on Semantic Scholar

Summary

The paper’s main contribution is introducing a dual-faceted benchmark suite, combining microbenchmarks and macrobenchmarks for comprehensive evaluation of intelligence processors.
It employs an industrial-level software stack and standardized metrics such as operations per joule and operations per second to deliver credible, fair, and portable assessments.
Evaluations reveal that specialized accelerators outperform traditional CPUs and GPUs in energy efficiency and performance, guiding future IP design optimizations.

An Examination of BenchIP: Benchmarking Intelligence Processors

The paper "BenchIP: Benchmarking Intelligence Processors" addresses the critical need for a standardized benchmarking suite tailored for the evaluation of intelligence processors (IPs), such as those used for deep learning tasks. These IPs have emerged due to the insufficiencies of general-purpose hardware like CPUs and GPUs in providing the desired performance and energy efficiency for modern deep learning architectures. The paper introduces BenchIP, a novel benchmark suite accompanied by a comprehensive benchmarking methodology aimed at providing credible, portable, and fair evaluations tailored to the current landscape of AI hardware.

Key Contributions and Structure

The core contribution of BenchIP lies in its dual-faceted benchmarking approach. The benchmark suite comprises two sets of benchmarks:

Microbenchmarks: These consist of 12 single-layer networks, designed for bottleneck analysis and aiding in system optimization. They cover a variety of layers widely prevalent in neural networks, such as convolutional (Conv.), pooling, and activation layers like Rectified Linear Units (ReLU) and Sigmoid, among others. These microbenchmarks allow for granular testing and are instrumental in stress-testing architectural designs.
Macrobenchmarks: This set consists of 11 complete neural networks extracted from real-world applications, representing diverse AI tasks such as image classification, object detection, and natural language processing. Notable networks like AlexNet, VGG, and ResNet feature prominently, offering a comprehensive performance evaluation tool for entire neural processing workflows rather than isolated operations.

The benchmarking methodology extends beyond existing standards by incorporating an industrial-level software stack and an interface that facilitates integration with various hardware architectures. This framework ensures cross-platform portability, leveraging a standardized high-level programming model compatible with common AI frameworks such as Caffe.

Methodological Details

To achieve credibility, BenchIP replicates real-world application scenarios through benchmarking frameworks that utilize learned weights and predefined configurations. For ensuring fairness, BenchIP employs performance, energy, area, and accuracy as core evaluation metrics, reflecting the diverse needs and trade-offs involved in IP design. These metrics are further aggregated into synthesized metrics such as operations per joule (denoting energy efficiency) and operations per second (denoting performance efficiency), permitting nuanced evaluations of hardware architectures.

Results and Analysis

By applying BenchIP, the paper evaluates eight types of hardware platforms, including standard CPUs and GPUs, as well as specialized ASIC accelerators (ACC-1 and ACC-2), derived from established architectures like DianNao and Cambricon. The results reveal several insights:

Customized accelerators exhibit significantly higher energy efficiency and performance compared to traditional CPUs and GPUs. This confirms their potential for broader adoption across various deployment scales from embedded systems to server environments.
Embedded GPUs demonstrate the ability to leverage high energy efficiency while sustaining competitive performance levels, which is particularly relevant in power-constrained scenarios.
The obtained benchmark results underscore the advantage of tailored IP solutions in effectively balancing energy, performance, and area considerations, enhancing the AI computational landscape's adaptability to various task requirements.

Implications and Future Outlooks

The research encapsulated in this paper carries profound implications for both theoretical and practical applications in AI hardware development. BenchIP not only provides a robust comparative framework for existing and emerging IPs but also assists architectural optimization by identifying design inefficiencies. Moreover, the paper's detailed discussion on benchmark diversity and representativeness underscores the need for continuous adaptation to keep pace with fast-evolving neural network models and application requirements.

In conclusion, BenchIP stands as a pivotal development in the field of AI hardware evaluation, offering an adaptable, detailed, and multifaceted benchmarking tool. As AI models and their demands further evolve, ongoing expansion and refinement of benchmark suites like BenchIP will be crucial to effectively gauge IP performance and guide innovative processor designs.