Improving Accuracy of Binary Neural Networks using Unbalanced Activation Distribution (2012.00938v2)

Published 2 Dec 2020 in cs.LG and cs.CV

Abstract: Binarization of neural network models is considered as one of the promising methods to deploy deep neural network models on resource-constrained environments such as mobile devices. However, Binary Neural Networks (BNNs) tend to suffer from severe accuracy degradation compared to the full-precision counterpart model. Several techniques were proposed to improve the accuracy of BNNs. One of the approaches is to balance the distribution of binary activations so that the amount of information in the binary activations becomes maximum. Based on extensive analysis, in stark contrast to previous work, we argue that unbalanced activation distribution can actually improve the accuracy of BNNs. We also show that adjusting the threshold values of binary activation functions results in the unbalanced distribution of the binary activation, which increases the accuracy of BNN models. Experimental results show that the accuracy of previous BNN models (e.g. XNOR-Net and Bi-Real-Net) can be improved by simply shifting the threshold values of binary activation functions without requiring any other modification.

PDF Abstract

Unbalanced Activation Distribution for Improved Binary Neural Network Accuracy

This paper addresses a critical concern in the deployment of Deep Neural Networks (DNNs) within resource-constrained environments by improving Binary Neural Network (BNN) accuracy. While BNNs are attractive due to their reduced memory and computational requirements compared to full-precision models, the significant accuracy degradation remains a challenge. The authors propose a novel approach that contrasts with previous beliefs by arguing that an unbalanced distribution of binary activations contributes to enhanced accuracy in BNNs.

Key Contributions and Results

Unbalanced Distribution in Activations: The authors suggest that unbalanced activation distribution, as opposed to the traditionally sought balance, results in improved performance for BNNs. This insight stems from the observation that widely used activation functions like ReLU inherently produce skewed output distributions, leading to better model performance in conventional full-precision networks.
Threshold Shifting in BNNs: To achieve the desired unbalance in binary activations, the paper proposes adjusting the threshold values of binary activation functions. This adjustment shifts the distribution of binary activations, significantly enhancing accuracy without necessitating complex modifications to the network architecture.
Experimental Validation: Extensive experimentation on models such as XNOR-Net and Bi-Real-Net across various datasets (including CIFAR-10 and ImageNet) demonstrates the viability of this approach. For instance, shifting thresholds in XNOR-Net resulted in a top-1 accuracy improvement of 3.0% on the ImageNet dataset.
Comparison with Trainable Thresholds: Previous methods suggested making binary activation thresholds trainable. This work clarifies that such approaches show limited efficacy since the learnable threshold does not offer additional benefits over the bias term of Batch Normalization (BN) layers. The paper elucidates that the BN bias inherently adjusts similarly to threshold adjustment, thus overshadowing any benefit from explicitly training thresholds.
Impacts of Additional Activation Functions: The role of additional activation functions (like PReLU) is also analyzed. It is shown that these layers inherently disrupt the balance of distributions, providing an implicit threshold-shifting effect, thus contributing to accuracy improvements in BNN models.

Theoretical and Practical Implications

The paper's findings suggest a paradigm shift in approaching binary neural networks. From a theoretical scope, it challenges the conventional notion that balanced activations through binary operations optimize information capacity. Instead, it supports that an intentional asymmetry can be strategically harnessed for better performance.

Practically, this method provides a simple and computationally inexpensive means to enhance BNNs, facilitating their deployment in edge computing scenarios where computational resources are limited. The absence of additional resource requirements or architectural complexity changes strengthens its applicability.

Speculation on Future Directions

Emerging trends in the AI domain suggest a growing interest in network quantization techniques, and this paper adds a critical piece to that body of knowledge. Future research could explore the interaction between this threshold-shifting method and other model optimization strategies, such as mixed-precision training and adaptive quantization levels.

Additionally, understanding the detailed mechanics of activation distribution's impact on gradient flow and model convergence might unleash further potential in lightweight neural network designs. Exploring the interplay between unbalanced activations and different types of neural architectures, including transformers and graph networks, could open new directions for BNN applicability.

In summary, this research advances the conversation on efficient neural network deployment by embracing unorthodox activation distribution strategies, offering promising pathways for future development in this vital area.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Hyungjun Kim (18 papers)
Jihoon Park (15 papers)
Changhun Lee (9 papers)
Jae-Joon Kim (15 papers)

Citations (27)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos