Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

Published 5 Dec 2017 in cs.NE and cs.AR | (1712.01507v2)

Abstract: Fully realizing the potential of acceleration for Deep Neural Networks (DNNs) requires understanding and leveraging algorithmic properties. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their classification accuracy. However, to prevent accuracy loss, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth requirements, or lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture enables minimizing the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of BitFusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss and Stripes. In the same area, frequency, and process technology, BitFusion offers 3.9x speedup and 5.1x energy savings over Eyeriss. Compared to Stripes, BitFusion provides 2.6x speedup and 3.9x energy reduction at 45 nm node when BitFusion area and frequency are set to those of Stripes. Scaling to GPU technology node of 16 nm, BitFusion almost matches the performance of a 250-Watt Titan Xp, which uses 8-bit vector instructions, while BitFusion merely consumes 895 milliwatts of power.

Abstract PDF Upgrade to Chat

Citations (460)

View on Semantic Scholar

Summary

The paper introduces dynamic bit-level fusion, enabling processing elements to be composed to match DNN layer bitwidths without sacrificing accuracy.
It details a bit-flexible microarchitecture using BitBricks that spatially fuse into Fused-PEs, optimizing parallel multiply-add operations in DNNs.
The evaluation demonstrates significant performance and energy efficiency gains over state-of-the-art designs, matching high-end GPUs with lower power consumption.

BitFusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

The paper presents BitFusion, a novel bit-level dynamically composable architecture aimed at enhancing the efficiency of deep neural network (DNN) accelerators. The proposed architecture leverages the property that DNNs can maintain accuracy even with reduced bitwidth operations, allowing significant reductions in computation and communication costs.

Core Contributions

Dynamic Bit-Level Fusion: BitFusion introduces bit-level fusion, dynamically composing processing elements to match the bitwidth requirements of individual DNN layers. This dynamic composability is achieved without any loss of accuracy, allowing the architecture to operate at minimal computation and communication granularity.
Bit-Flexible Microarchitecture: The architecture comprises an array of bit-level processing elements, known as BitBricks, which can be spatially fused to create Fused Processing Engines (Fused-PEs). These engines adapt to varying bitwidths required by DNN operations, significantly enhancing parallelism, particularly for multiply-add operations common in DNNs.
Instruction Set Architecture (ISA): BitFusion features a novel ISA designed to enable hardware-software collaboration, allowing flexible bit-level operations in a structured block manner. This ISA effectively minimizes instruction handling overhead while supporting diverse DNN models and quantization levels.

Evaluation Insights

The paper rigorously evaluates BitFusion using eight diverse DNN models implemented in Verilog and synthesized in 45nm technology. When compared to state-of-the-art accelerators, such as Eyeriss and Stripes, BitFusion demonstrates substantial improvements:

Performance and Energy Efficiency: BitFusion achieves notable speedups and energy savings, with average improvements of \eyerissPerfAvg in speed and \eyerissEnergyAvg in energy consumption over Eyeriss. It also outperforms Stripes with \stripesPerfAvg speedup and \stripesEnergyAvg energy reduction at the same fabrication node.
Scalability: The authors scale BitFusion to the 16nm GPU node to compare it with Nvidia's GPUs, showing competitive performance with minimal power consumption (895 milliwatts), closely matching a 250-Watt Titan Xp GPU.

Theoretical and Practical Implications

The introduction of bit-level dynamic composability could redefine design strategies for future hardware accelerators. By matching operational precision to each layer’s requirements, architectures like BitFusion can achieve significant optimizations without sacrificing accuracy. This approach facilitates the development of more energy-efficient and higher-performance DNN accelerators applicable in power-constrained environments, such as mobile and embedded systems.

Potential Future Developments

Future research could explore further optimizations of bit-level fusion, such as incorporating additional machine learning models and quantization techniques. Additionally, investigating the synthesis of BitFusion in different technology nodes and evaluating its performance with even more complex neural network models could provide further insights into its scalability and versatility.

In conclusion, the BitFusion architecture represents a significant advancement in the design of efficient hardware accelerators for DNNs. By focusing on bit-level dynamism and composability, it addresses the constraints of fixed-bitwidth designs and opens avenues for more adaptable and resource-efficient computation.

Markdown