Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Instant Quantization of Neural Networks using Monte Carlo Methods (1905.12253v2)

Published 29 May 2019 in cs.LG and stat.ML

Abstract: Low bit-width integer weights and activations are very important for efficient inference, especially with respect to lower power consumption. We propose Monte Carlo methods to quantize the weights and activations of pre-trained neural networks without any re-training. By performing importance sampling we obtain quantized low bit-width integer values from full-precision weights and activations. The precision, sparsity, and complexity are easily configurable by the amount of sampling performed. Our approach, called Monte Carlo Quantization (MCQ), is linear in both time and space, with the resulting quantized, sparse networks showing minimal accuracy loss when compared to the original full-precision networks. Our method either outperforms or achieves competitive results on multiple benchmarks compared to previous quantization methods that do require additional training.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Matthijs Van Keirsbilck (7 papers)
  2. Alexander Keller (38 papers)
  3. Gonçalo Mordido (15 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.