Forward and Backward Information Retention for Accurate Binary Neural Networks (1909.10788v4)

Published 24 Sep 2019 in cs.CV

Abstract: Weight and activation binarization is an effective approach to deep neural network compression and can accelerate the inference by leveraging bitwise operations. Although many binarization methods have improved the accuracy of the model by minimizing the quantization error in forward propagation, there remains a noticeable performance gap between the binarized model and the full-precision one. Our empirical study indicates that the quantization brings information loss in both forward and backward propagation, which is the bottleneck of training accurate binary neural networks. To address these issues, we propose an Information Retention Network (IR-Net) to retain the information that consists in the forward activations and backward gradients. IR-Net mainly relies on two technical contributions: (1) Libra Parameter Binarization (Libra-PB): simultaneously minimizing both quantization error and information loss of parameters by balanced and standardized weights in forward propagation; (2) Error Decay Estimator (EDE): minimizing the information loss of gradients by gradually approximating the sign function in backward propagation, jointly considering the updating ability and accurate gradients. We are the first to investigate both forward and backward processes of binary networks from the unified information perspective, which provides new insight into the mechanism of network binarization. Comprehensive experiments with various network structures on CIFAR-10 and ImageNet datasets manifest that the proposed IR-Net can consistently outperform state-of-the-art quantization methods.

PDF Abstract

Overview of "Forward and Backward Information Retention for Accurate Binary Neural Networks"

The paper "Forward and Backward Information Retention for Accurate Binary Neural Networks," authored by Haotong Qin, Ruihao Gong, Xianglong Liu, Mingzhu Shen, Ziran Wei, Fengwei Yu, and Jingkuan Song, presents an innovative approach to enhancing the performance of binary neural networks (BNNs) by reducing information loss in both forward and backward propagation. The authors introduce the Information Retention Network (IR-Net), which leverages two key techniques: Libra Parameter Binarization (Libra-PB) and the Error Decay Estimator (EDE). These methods together aim to address the accuracy drop that often occurs with binarized versions of deep neural networks (DNNs) compared to their full-precision counterparts.

Binary Neural Networks: Challenges and Innovations

Binary Neural Networks are particularly attractive due to their low storage requirements and efficient inference, relying on bitwise operations. Despite their efficiency, they typically suffer from a significant performance gap relative to full-precision models due to the limitations inherent in quantizing weights and activations, which leads to information loss. The traditional quantization methods generally minimize quantization error without adequately addressing the broader issue of information loss throughout the training process.

Libra Parameter Binarization (Libra-PB)

The Libra Parameter Binarization technique is a novel approach in the forward propagation of BNNs, designed to minimize both quantization error and information loss by transforming weight distributions. This involves balancing the network weights to achieve a zero-mean distribution, which maximizes the information entropy and diversity. By maximizing the entropy, Libra-PB ensures the binary representations of weights and activations carry more significant information, thereby reducing the potential performance drop. Moreover, the introduction of an integer bit-shift scale allows for enhanced representation capability without incurring additional floating-point computational cost.

Error Decay Estimator (EDE)

To further mitigate the limitations of binary networks, the Error Decay Estimator is integrated into the backward propagation process. Unlike typical approaches that either focus solely on updating capability or gradient precision, EDE dynamically adjusts the training process according to specific stages. In the early training stages, it maintains update potential by keeping derivatives close to one, gradually shifting towards precise gradients near the training's end. This method successfully balances the trade-off between strong update potential and low gradient error throughout the training, thereby retaining optimization accuracy.

Experimental Validation and Implications

The efficacy of the IR-Net is rigorously validated across various network structures such as ResNet-20, VGG-Small, ResNet-18, and ResNet-34, using CIFAR-10 and ImageNet datasets. The results indicate that IR-Net consistently outperforms existing state-of-the-art quantization approaches, closing the performance gap between binary and full-precision models significantly. This improvement highlights the feasibility of BNNs in environments requiring reduced computational overhead without sacrificing accuracy.

Future Perspectives

The advancements demonstrated by IR-Net underscore its potential applications in areas demanding efficient neural networks, particularly on resource-constrained devices such as mobile phones and IoT devices. Future exploration could involve refining the techniques of Libra-PB and EDE to further enhance efficiency, or extending them to more complex neural network architectures beyond standard CNNs. Additionally, integrating these approaches with other neural network optimization techniques could unlock even greater performance levels, paving the way for broader adoption of BNNs in practical applications.

In summary, the introduction of IR-Net presents a substantial step forward in the field of model binarization, promising more accurate and resource-efficient neural networks. The methodologies outlined within this paper not only advance current scientific understanding but also expand practical horizons for deploying machine learning models in real-world scenarios.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Haotong Qin (60 papers)
Ruihao Gong (40 papers)
Xianglong Liu (128 papers)
Mingzhu Shen (14 papers)
Ziran Wei (1 paper)
Fengwei Yu (23 papers)
Jingkuan Song (115 papers)

Citations (295)

View on Semantic Scholar