Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AdaptivFloat: A Floating-point based Data Type for Resilient Deep Learning Inference (1909.13271v3)

Published 29 Sep 2019 in cs.LG, cs.AR, and stat.ML

Abstract: Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optimally clips its available dynamic range, at a layer granularity, in order to create faithful encoding of neural network parameters. AdaptivFloat consistently produces higher inference accuracies compared to block floating-point, uniform, IEEE-like float or posit encodings at very low precision ($\leq$ 8-bit) across a diverse set of state-of-the-art neural network topologies. And notably, AdaptivFloat is seen surpassing baseline FP32 performance by up to +0.3 in BLEU score and -0.75 in word error rate at weight bit widths that are $\leq$ 8-bit. Experimental results on a deep neural network (DNN) hardware accelerator, exploiting AdaptivFloat logic in its computational datapath, demonstrate per-operation energy and area that is 0.9$\times$ and 1.14$\times$, respectively, that of equivalent bit width integer-based accelerator variants.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Thierry Tambe (11 papers)
  2. En-Yu Yang (4 papers)
  3. Zishen Wan (33 papers)
  4. Yuntian Deng (44 papers)
  5. Vijay Janapa Reddi (78 papers)
  6. Alexander Rush (11 papers)
  7. David Brooks (204 papers)
  8. Gu-Yeon Wei (54 papers)
Citations (20)