Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices (1908.05858v1)

Published 16 Aug 2019 in cs.CV, cs.MM, and eess.IV

Abstract: It is always well believed that Binary Neural Networks (BNNs) could drastically accelerate the inference efficiency by replacing the arithmetic operations in float-valued Deep Neural Networks (DNNs) with bit-wise operations. Nevertheless, there has not been open-source implementation in support of this idea on low-end ARM devices (e.g., mobile phones and embedded devices). In this work, we propose daBNN --- a super fast inference framework that implements BNNs on ARM devices. Several speed-up and memory refinement strategies for bit-packing, binarized convolution, and memory layout are uniquely devised to enhance inference efficiency. Compared to the recent open-source BNN inference framework, BMXNet, our daBNN is $7\times$$\sim$$23\times$ faster on a single binary convolution, and about $6\times$ faster on Bi-Real Net 18 (a BNN variant of ResNet-18). The daBNN is a BSD-licensed inference framework, and its source code, sample projects and pre-trained models are available on-line: https://github.com/JDAI-CV/dabnn.

daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM Devices

The paper, titled "daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices," introduces a highly optimized inference framework specifically designed to implement Binary Neural Networks (BNNs) on ARM devices. This research addresses the limitations of executing DNNs on low-end devices, such as mobile phones, due to their constrained memory and computational capabilities. BNNs offer a potential solution by quantizing weights and activations to binary values, thus facilitating efficient inference through bit-wise operations.

Key Contributions

The primary contribution of this research is the development of the daBNN framework, which is characteristically faster than existing alternatives. Notable innovations include:

  1. Bit-Packing Optimization: The authors present an enhanced bit-packing scheme that significantly outperforms traditional methods by employing SIMD instructions to aggregate multiple elements simultaneously, reducing the latency by fourfold compared to naive sequential approaches.
  2. Binary Direct Convolution: This method is proposed to improve the inefficiencies found in traditional binary matrix multiplications (BGEMM) used in BNNs. By modifying the calculation order, daBNN reduces the overhead of additional instructions, specifically achieving better performance on ARMv8 and ARMv7 architectures.
  3. Memory Layout Refinement: Introducing a new NC\textsubscript{1}HWC\textsubscript{2} memory layout leverages spatial redundancies in convolution operations, thereby decreasing memory access by approximately two-thirds compared to conventional layouts.

Performance Evaluation

Empirical evaluations demonstrate that daBNN achieves substantial improvements in inference speed, specifically being 7× to 23× faster than BMXNet on single binary convolution tasks and roughly 6× faster on Bi-Real Net 18. Compared to TensorFlow Lite, daBNN shows an 8× to 10× performance increase in single binary convolution and offers a 3× improvement with Bi-Real Net 18. Such results underline the efficiency of the framework in practical deployment scenarios.

Implications and Future Work

The implications of this research are twofold:

  • Practical Deployability: daBNN provides an open-source, BSD-licensed solution for deploying binary networks on ARM devices, encouraging broader adoption and experimentation in industry environments where computational efficiency is crucial.
  • Research Opportunities: The framework's availability facilitates the exploration and design of novel BNN structures, offering insight into more computationally efficient architectures.

Looking forward, the authors express an interest in expanding the architecture support of daBNN to x86 and RISC-V platforms. Furthermore, collaboration opportunities with research teams to innovate and refine BNN structures are anticipated as future directions.

This research contributes a significant advancement in executing neural networks efficiently on constrained hardware, offering both a practical tool for deployment and a foundation for continued innovation in BNN design. The availability of daBNN as an open-source project further cements its potential influence, encouraging developers and researchers alike to leverage and build upon this framework in advancing neural network applications on low-end devices.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jianhao Zhang (31 papers)
  2. Yingwei Pan (77 papers)
  3. Ting Yao (127 papers)
  4. He Zhao (117 papers)
  5. Tao Mei (209 papers)
Citations (62)
Github Logo Streamline Icon: https://streamlinehq.com