Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions (2003.03488v2)

Published 7 Mar 2020 in cs.CV, cs.LG, and eess.IV

Abstract: In this paper, we propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost. We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts, bypassing all the intermediate convolutional layers including the downsampling layers. This baseline network strikes a good trade-off between accuracy and efficiency, achieving superior performance than most of existing binary networks at approximately half of the computational cost. Through extensive experiments and analysis, we observed that the performance of binary networks is sensitive to activation distribution variations. Based on this important observation, we propose to generalize the traditional Sign and PReLU functions, denoted as RSign and RPReLU for the respective generalized functions, to enable explicit learning of the distribution reshape and shift at near-zero extra cost. Lastly, we adopt a distributional loss to further enforce the binary network to learn similar output distributions as those of a real-valued network. We show that after incorporating all these ideas, the proposed ReActNet outperforms all the state-of-the-arts by a large margin. Specifically, it outperforms Real-to-Binary Net and MeliusNet29 by 4.0% and 3.6% respectively for the top-1 accuracy and also reduces the gap to its real-valued counterpart to within 3.0% top-1 accuracy on ImageNet dataset. Code and models are available at: https://github.com/liuzechun/ReActNet.

Citations (314)

Summary

  • The paper introduces a novel 1-bit CNN based on a binarized MobileNetV1, achieving 61.1% top-1 accuracy at only 87M operations.
  • The study proposes adaptive generalized activation functions (RSign and RPReLU) that reshape activations and substantially boost binary network precision.
  • An innovative distributional loss aligns binary and real-valued outputs, enabling ReActNet to narrow the accuracy gap to just 3% on ImageNet.

An Expert Overview of "ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions"

Binary Neural Networks (BNNs), particularly 1-bit Convolutional Neural Networks (1-bit CNNs), have been recognized for their potential in deploying deep learning models on resource-limited devices due to significant reductions in memory and computational demands. However, a persistent challenge facing BNNs is the substantial accuracy gap compared to their real-valued counterparts. The paper "ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions" proposes novel methodologies to enhance BNNs' accuracy while maintaining their computational efficiency, contributing significant methodological and practical advancements in the area.

Key Contributions and Findings

  1. Baseline Network Design: The paper introduces a new baseline model by binarizing a MobileNetV1 architecture, incorporating parameter-free shortcuts. This design effectively bypasses binarized convolutions while maintaining a crucial propagation of real-valued features. Notably, this baseline network achieves superior performance compared to existing BNNs at approximately half the computational cost. Specifically, it achieves an impressive top-1 accuracy of 61.1% on the ImageNet dataset with a computational cost of just 87M operations.
  2. Generalized Activation Functions: A pivotal aspect of this paper is the focus on activation distribution. The authors propose ReActNet, which includes generalized activation functions called ReAct-Sign (RSign) and ReAct-PReLU (RPReLU). These functions include learnable parameters that allow the network to reshape and shift activation distributions effectively, proving crucial for enhancing the accuracy of BNNs. The adaptive nature of these functions is shown to improve baseline accuracy substantially with minimal additional cost.
  3. Distributional Loss: Another innovation introduced is a distributional loss function that aligns the output distributions of the binary and real-valued networks. This approach further refines the network's performance, augmenting the aforementioned baseline improvements.
  4. Empirical Results: Incorporating all proposed enhancements, ReActNet surpasses state-of-the-art BNN models, achieving a top-1 accuracy of 69.4% on ImageNet. This result notably narrows the gap to 3.0% of the accuracy observed in real-valued networks while maintaining a substantial computational efficiency advantage.

Theoretical and Practical Implications

Theoretically, this paper underscores the crucial role of activation distribution in BNNs, which had been relatively underexplored in prior works focused primarily on quantization or architectural modification. The introduction of RSign and RPReLU marks a shift towards more nuanced control over activation behaviors, providing a framework that can extend varying levels of precision in BNN applications.

Practically, the enhancements proposed make BNNs more viable for real-world applications where computational resources are limited. The reduced gap in performance between binary and real-valued networks paves the way for deploying binary networks in edge devices, IoT systems, and other low-power environments, expanding the practical use cases for BNNs.

Future Directions

The promising results showcased by ReActNet suggest several potential avenues for further exploration:

  • Hardware Optimization: Implementing these networks on specialized hardware (e.g., FPGAs, dedicated chips) could amplify the computational advantages while fully leveraging the efficient architecture proposed.
  • Generalization Across Architectures: While the paper centers on MobileNetV1, future work could explore how these techniques generalize to other compact neural networks or domain-specific architectures.
  • Further Reducing Real-Value Dependency: Investigating methods to eliminate remaining real-valued operations without sacrificing accuracy could further streamline BNNs for ultra-efficient deployment.

In conclusion, this paper makes significant strides in closing the performance gap for binary neural networks, identifies key weaknesses in current approaches, and introduces innovative techniques with broad applicability in efficient AI deployment. As the field progresses, the methodologies and insights presented in this paper will likely serve as crucial underpinnings for future research and development in BNNs.