Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm (1808.00278v5)

Published 1 Aug 2018 in cs.CV

Abstract: In this work, we study the 1-bit convolutional neural networks (CNNs), of which both the weights and activations are binary. While being efficient, the classification accuracy of the current 1-bit CNNs is much worse compared to their counterpart real-valued CNN models on the large-scale dataset, like ImageNet. To minimize the performance gap between the 1-bit and real-valued CNN models, we propose a novel model, dubbed Bi-Real net, which connects the real activations (after the 1-bit convolution and/or BatchNorm layer, before the sign function) to activations of the consecutive block, through an identity shortcut. Consequently, compared to the standard 1-bit CNN, the representational capability of the Bi-Real net is significantly enhanced and the additional cost on computation is negligible. Moreover, we develop a specific training algorithm including three technical novelties for 1- bit CNNs. Firstly, we derive a tight approximation to the derivative of the non-differentiable sign function with respect to activation. Secondly, we propose a magnitude-aware gradient with respect to the weight for updating the weight parameters. Thirdly, we pre-train the real-valued CNN model with a clip function, rather than the ReLU function, to better initialize the Bi-Real net. Experiments on ImageNet show that the Bi-Real net with the proposed training algorithm achieves 56.4% and 62.2% top-1 accuracy with 18 layers and 34 layers, respectively. Compared to the state-of-the-arts (e.g., XNOR Net), Bi-Real net achieves up to 10% higher top-1 accuracy with more memory saving and lower computational cost. Keywords: binary neural network, 1-bit CNNs, 1-layer-per-block

Citations (518)

View on Semantic Scholar

Summary

The paper presents a novel Bi-Real Net that enhances 1-bit CNNs by preserving real-valued activations via identity shortcuts.
It introduces advanced training techniques including a tight derivative approximation, magnitude-aware gradients, and a new initialization strategy.
Experimental results on ImageNet show up to a 10% improvement in top-1 accuracy, highlighting its practical viability for efficient deep learning.

Enhancing 1-bit CNNs with Bi-Real Net

The paper "Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm," presents a detailed investigation into addressing the performance gap between 1-bit CNNs and their real-valued counterparts, particularly focusing on large-scale datasets like ImageNet. Through the introduction of a novel architecture called Bi-Real Net, the authors aim to enhance the representational power of 1-bit convolutional neural networks (CNNs) while minimizing the additional computational overhead.

Key Contributions

The paper introduces Bi-Real Net, an innovative model architecture that maintains real-valued activations between binary convolution layers via identity shortcuts. This approach significantly enhances the network's representational capability compared to traditional 1-bit CNNs. The proposed architecture incurs negligible computational cost due to its simplistic design involving an extra element-wise addition.

Additionally, a specialized training algorithm is developed, incorporating three technical innovations:

Derivative Approximation: A tight approximation to the derivative of the non-differentiable sign function is derived, leveraging a piecewise polynomial function to better emulate the sign function, resulting in improved gradient-flow during training.
Magnitude-aware Gradient: The authors propose a magnitude-aware gradient mechanism to update weight parameters, addressing the inherent difficulty in changing the sign of binary weights with standard gradient descent.
Initialization Strategy: A novel initialization approach involving pre-training with a clip function instead of ReLU leads to better network initialization, optimizing the training convergence for 1-bit CNNs.

Experimental Results

The experiments conducted on ImageNet demonstrate that the Bi-Real Net achieves notable improvements in accuracy over existing methods:

A top-1 accuracy of 56.4% and 62.2% for 18-layer and 34-layer models, respectively.
Compared to XNOR-Net, Bi-Real Net shows up to a 10% increase in top-1 accuracy.
The representational capability is significantly enhanced, evidenced by improved accuracy over baseline models.

Implications and Future Work

The practical implications of this research are profound, particularly in the deployment of efficient deep learning models on resource-constrained devices. The enhanced representational capability of Bi-Real Net makes it a viable candidate for real-world applications necessitating lightweight models without sacrificing performance.

The authors suggest that future work could explore advanced integer programming techniques, such as Lp-Box ADMM, to push the boundaries of 1-bit CNNs further. This research opens a path for continued advancements in optimizing binary neural networks, potentially influencing a broader scope of computational optimization tasks in machine learning.

By focusing on both theoretical improvement and practical efficiency, Bi-Real Net makes an impactful contribution to the ongoing development of efficient deep learning architectures.

PDF Markdown