Ternary Weight Networks (1605.04711v3)

Published 16 May 2016 in cs.CV

Abstract: We present a memory and computation efficient ternary weight networks (TWNs) - with weights constrained to +1, 0 and -1. The Euclidian distance between full (float or double) precision weights and the ternary weights along with a scaling factor is minimized in training stage. Besides, a threshold-based ternary function is optimized to get an approximated solution which can be fast and easily computed. TWNs have shown better expressive abilities than binary precision counterparts. Meanwhile, TWNs achieve up to 16$\times$ model compression rate and need fewer multiplications compared with the float32 precision counterparts. Extensive experiments on MNIST, CIFAR-10, and ImageNet datasets show that the TWNs achieve much better result than the Binary-Weight-Networks (BWNs) and the classification performance on MNIST and CIFAR-10 is very close to the full precision networks. We also verify our method on object detection task and show that TWNs significantly outperforms BWN by more than 10\% mAP on PASCAL VOC dataset. The pytorch version of source code is available at: https://github.com/Thinklab-SJTU/twns.

Authors (5)

Fengfu Li (5 papers)
Bin Liu (441 papers)
Xiaoxing Wang (11 papers)
Bo Zhang (633 papers)
Junchi Yan (241 papers)

Citations (503)

View on Semantic Scholar

Summary

Ternary Weight Networks: An Insightful Overview

The paper "Ternary Weight Networks" by Fengfu Li et al. addresses the significant challenges faced in deploying deep neural networks (DNNs) on devices with limited computational and storage resources. It introduces Ternary Weight Networks (TWNs), a quantization approach that restricts network weights to three discrete values: +1, 0, and -1. This approach strikes a balance between conventional floating-point precision networks and binary precision networks.

Methodology and Contributions

The authors present a method where the Euclidean distance between full precision and ternary weights, adjusted by a scaling factor, is minimized during training. A threshold-based ternary function is optimized to enhance computational efficiency, yielding an approximation that is both fast and easily computed.

Key contributions of the paper include:

Introduction of a ternary weight quantization scheme to reduce storage and computational cost in DNNs.
Development of an approximated solution using a threshold-based approach for ternary weight calculation.
Extensive experimentation showing that TWNs outperform binary weight networks (BWNs) in various benchmarks.

Experimental Results

Experiments conducted on datasets such as MNIST, CIFAR-10, and ImageNet demonstrate the efficacy of TWNs. The results indicate that TWNs offer a substantial model compression rate of up to 16x compared to float32 precision networks, achieving similar accuracy levels to full precision networks on MNIST and CIFAR-10. In ImageNet, although there is a performance gap compared to full precision networks, TWNs outperform binary precision networks.

Further tests on object detection tasks using the PASCAL VOC dataset show that TWNs significantly exceed the performance of BWNs by over 10% in mean average precision (mAP).

Technical Insights

TWNs leverage a weight representation that requires only 2 bits per weight, contrasting with the 1-bit approach of BWNs while maintaining the capacity for higher accuracy as demonstrated by their 38x greater template space for convolutional filters. The threshold-based ternary function offers a practical technique for optimizing weight configurations effectively.

Implications and Future Directions

The development of TWNs provides a pathway for deploying sophisticated DNN models on edge devices where resource constraints are a critical consideration. The balance achieved by TWNs between computational efficiency and model accuracy is advantageous for real-world applications, including mobile and embedded device environments.

Theoretically, TWNs may inspire further research into alternative quantization methods that expand upon the ternary weight model or explore hybrid quantization schemes to further enhance the balance of efficiency and performance. Future work may also delve into optimizing hardware designs specifically tailored for ternary networks to exploit their computation-saving features further.

Conclusion

This research contributes significantly to the ongoing effort to optimize deep learning models for environments with limited resources. The paper provides a clear demonstration of how strategic weight quantization, specifically through ternary networks, can dramatically decrease computational demands while preserving model integrity, illustrating a promising approach in the evolution of neural network deployments across diverse platforms.

PDF Markdown

Related Papers

Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML (2020)
Soft Threshold Ternary Networks (2022)
Sparsity-Control Ternary Weight Networks (2020)
Trained Ternary Quantization (2016)
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs (2025)

Find Related Papers

GitHub

GitHub - Thinklab-SJTU/twns (17 stars)