Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks (1709.06262v2)

Published 19 Sep 2017 in cs.CV

Abstract: A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hard- ware implementation costs during training to achieve significant model compression for inference. Training involves three stages: network training using L2 regularization and a quantization threshold regularizer, quantization pruning, and finally retraining. Resulting networks achieve improved accuracy, reduced memory footprint and reduced computational complexity compared with conventional methods, on MNIST and CIFAR10 datasets. Our networks are up to 98% sparse and 5 & 11 times smaller than equivalent binary and ternary models, translating to significant resource and speed benefits for hardware implementations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Julian Faraone (4 papers)
  2. Nicholas Fraser (11 papers)
  3. Giulio Gambardella (12 papers)
  4. Michaela Blott (31 papers)
  5. Philip H. W. Leong (12 papers)
Citations (12)

Summary

We haven't generated a summary for this paper yet.