WRPN: Wide Reduced-Precision Networks (1709.01134v1)

Published 4 Sep 2017 in cs.CV, cs.LG, and cs.NE

Abstract: For computer vision applications, prior works have shown the efficacy of reducing numeric precision of model parameters (network weights) in deep neural networks. Activation maps, however, occupy a large memory footprint during both the training and inference step when using mini-batches of inputs. One way to reduce this large memory footprint is to reduce the precision of activations. However, past works have shown that reducing the precision of activations hurts model accuracy. We study schemes to train networks from scratch using reduced-precision activations without hurting accuracy. We reduce the precision of activation maps (along with model parameters) and increase the number of filter maps in a layer, and find that this scheme matches or surpasses the accuracy of the baseline full-precision network. As a result, one can significantly improve the execution efficiency (e.g. reduce dynamic memory footprint, memory bandwidth and computational energy) and speed up the training and inference process with appropriate hardware support. We call our scheme WRPN - wide reduced-precision networks. We report results and show that WRPN scheme is better than previously reported accuracies on ILSVRC-12 dataset while being computationally less expensive compared to previously reported reduced-precision networks.

Citations (263)

View on Semantic Scholar

Summary

The paper introduces a strategy that reduces both activation and weight precision while widening network filters to preserve or improve accuracy.
It demonstrates that 4-bit activations and 2-bit weights achieve parity with full-precision networks on architectures like ResNet-34 and AlexNet.
The approach offers significant benefits by lowering memory and computation costs, making it ideal for resource-constrained and hardware-accelerated environments.

An Analysis of WRPN: Wide Reduced-Precision Networks

The paper "WRPN: Wide Reduced-Precision Networks" presents a significant contribution to the field of deep learning by addressing the computational and memory overhead associated with training and deploying deep neural networks (DNNs). The authors propose a novel strategy termed Wide Reduced-Precision Networks (WRPN), which aims to optimize both execution efficiency and model accuracy by reducing the numeric precision of both activations and model parameters, while simultaneously increasing the width of filter maps in neural network layers. This strategic alteration allows the network to maintain, or even surpass, the accuracy of full-precision counterparts while significantly reducing computational resource demands.

Key Contributions and Methodology

The authors introduce WRPN as a response to the inefficiencies observed in prior approaches that primarily reduced the precision of network weights yet often compromised on model accuracy, particularly when applied to activations. Unlike these earlier methods, WRPN reduces the precision of both activations and weights. This reduction is complemented by an increase in the number of filter maps per layer, which compensates for the potential loss in information capacity due to lower precision. When fully leveraged, WRPN is capable of narrowing or closing the accuracy gap with full-precision networks, offering substantial computational advantages.

The WRPN approach was empirically validated across prominent DNN architectures, including AlexNet, ResNet-34, and batch-normalized Inception, using the ILSVRC-12 dataset. The paper reports that 4-bit activation precision and 2-bit weights achieve parity with or exceed the accuracy of baseline full-precision networks. With this strategy, binary networks demonstrated state-of-the-art results in top-1 accuracy: 69.85% for ResNet-34 with 2x widening, and 48.04% for AlexNet with 1.3x widening.

Practical and Theoretical Implications

The practical implications of WRPN are particularly relevant for scenarios where resource constraints are paramount, such as in embedded systems or edge computing environments. By lowering both computational and memory requirements through reduced precision, WRPN can facilitate the deployment of sophisticated DNNs on devices with limited resources. Moreover, WRPN's hardware-friendly quantization scheme is advantageous for leveraging accelerators like FPGAs and ASICs, which are well-suited for fixed-point and integer operations as highlighted in the authors' empirical evaluations. The evaluation demonstrated that reduced-precision operations achieved substantial efficiency gains on platforms such as Titan X GPU, Arria-10 FPGA, and ASIC, with FPGAs and ASICs significantly outstripping GPUs in terms of performance with very low precision computations.

From a theoretical perspective, WRPN challenges the traditional dichotomy between network precision and accuracy by illustrating how strategic design choices (e.g., increasing layer width) can mitigate the accuracy loss typically associated with reduced precision. This paradigm could inspire subsequent research into further optimization of network architectures that capitalise on hardware capacities while conserving energy and computational resources.

Future Directions in Low-Precision AI

Looking forward, WRPN paves the way for further explorations into hybrid precision strategies where networks might dynamically adjust precision based on the computational budget or task complexity. Additionally, exploring synchronised scaling of precision across complementary network layers, such as batch normalization, could further harmonize efficiency and accuracy. As hardware continues to evolve with specific support for low-precision arithmetic, there is potential for extensive hardware-software co-design efforts that could refine the precision trade-offs more seamlessly within AI systems.

In summary, the WRPN framework represents a notable advancement in the optimization of DNNs through innovative use of reduced precision and strategic architectural adjustments, providing a holistic approach to maintaining model efficacy while maximizing computational efficiency.

PDF Markdown