Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training (2211.10737v4)
Abstract: The unprecedented demand for computing resources to train DNN models has led to a search for minimal numerical encoding. Recent state-of-the-art (SOTA) proposals advocate for multi-level scaled narrow bitwidth numerical formats. In this paper, we show that single-level scaling is sufficient to maintain training accuracy while maximizing arithmetic density. We identify a previously proposed single-level scaled format for 8-bit training, Hybrid Block Floating Point (HBFP), as the optimal candidate to minimize. We perform a full-scale exploration of the HBFP design space using mathematical tools to study the interplay among various parameters and identify opportunities for even smaller encodings across layers and epochs. Based on our findings, we propose Accuracy Booster, a mixed-mantissa HBFP technique that uses 4-bit mantissas for over 99% of all arithmetic operations in training and 6-bit mantissas only in the last epoch and first/last layers. We show Accuracy Booster enables increasing arithmetic density over all other SOTA formats by at least 2.3x while achieving state-of-the-art accuracies in 4-bit training.
- Wasserstein GAN. CoRR, abs/1701.07875, 2017. URL http://arxiv.org/abs/1701.07875.
- Hardware approximate techniques for deep neural network accelerators: A survey. ACM Comput. Surv., 55(4):83:1–83:36, 2023. doi: 10.1145/3527156. URL https://doi.org/10.1145/3527156.
- Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
- PACT: parameterized clipping activation for quantized neural networks. CoRR, abs/1805.06085, 2018. URL http://arxiv.org/abs/1805.06085.
- Palm: Scaling language modeling with pathways. CoRR, abs/2204.02311, 2022. doi: 10.48550/arXiv.2204.02311. URL https://doi.org/10.48550/arXiv.2204.02311.
- Binaryconnect: Training deep neural networks with binary weights during propagations. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 3123–3131, 2015a. URL https://proceedings.neurips.cc/paper/2015/hash/3e15cc11f979ed25912dff5b0669f2cd-Abstract.html.
- Low precision arithmetic for deep learning. In Y. Bengio and Y. LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Workshop Track Proceedings, 2015b. URL http://arxiv.org/abs/1412.7024.
- Pushing the limits of narrow precision inferencing at cloud scale with microsoft floating point. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 10271–10281. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/747e32ab0fea7fbd2ad9ec03daa3f840-Paper.pdf.
- Mixed precision training of convolutional neural networks using integer operations. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=H135uzZ0-.
- Llm.int8(): 8-bit matrix multiplication for transformers at scale. CoRR, abs/2208.07339, 2022a. doi: 10.48550/arXiv.2208.07339. URL https://doi.org/10.48550/arXiv.2208.07339.
- 8-bit optimizers via block-wise quantization. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022b. URL https://openreview.net/forum?id=shpkpVXzo3h.
- Training DNNs with Hybrid Block Floating Point. arXiv:1804.01526 [cs, stat], Dec. 2018. URL http://arxiv.org/abs/1804.01526. arXiv: 1804.01526.
- M. P. Drumond. Coltrain: Co-located dnn training and inference. page 115, 2020. doi: 10.5075/epfl-thesis-10265. URL http://infoscience.epfl.ch/record/280118.
- A block minifloat representation for training deep neural networks. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=6zaTwpNSsQ2.
- Fractrain: Fractionally squeezing bit savings both temporally and spatially for efficient DNN training. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/8dc5983b8c4ef1d8fcd5f325f9a65511-Abstract.html.
- CPT: efficient deep neural network training via cyclic precision. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. URL https://openreview.net/forum?id=87ZwsaQNHPZ.
- Bit-parallel vector composability for neural acceleration. In 57th ACM/IEEE Design Automation Conference, DAC 2020, San Francisco, CA, USA, July 20-24, 2020, pages 1–6. IEEE, 2020. doi: 10.1109/DAC18072.2020.9218656. URL https://doi.org/10.1109/DAC18072.2020.9218656.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pages 1026–1034. IEEE Computer Society, 2015. doi: 10.1109/ICCV.2015.123. URL https://doi.org/10.1109/ICCV.2015.123.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 770–778. IEEE Computer Society, 2016. doi: 10.1109/CVPR.2016.90. URL https://doi.org/10.1109/CVPR.2016.90.
- Densely connected convolutional networks. CoRR, abs/1608.06993, 2016. URL http://arxiv.org/abs/1608.06993.
- Binarized neural networks. In D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 4107–4115, 2016. URL https://proceedings.neurips.cc/paper/2016/hash/d8330f857a17c53d217014ee776bfd50-Abstract.html.
- S. Khoram and J. Li. Adaptive quantization of neural networks. page 13, 2018.
- A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario, 2009.
- Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://papers.nips.cc/paper/2017/hash/a0160709701140704575d499c997b6ca-Abstract.html.
- F. Li and B. Liu. Ternary weight networks. CoRR, abs/1605.04711, 2016. URL http://arxiv.org/abs/1605.04711.
- Visualizing the loss landscape of neural nets. In S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 6391–6401, 2018. URL https://proceedings.neurips.cc/paper/2018/hash/a41b3bb3e6b050b6c9067c67f663b915-Abstract.html.
- Fixed point quantization of deep convolutional networks. In M. Balcan and K. Q. Weinberger, editors, Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, pages 2849–2858. JMLR.org, 2016. URL http://proceedings.mlr.press/v48/linb16.html.
- Mixed Precision Training With 8-bit Floating Point. arXiv:1905.12334 [cs, stat], May 2019. URL http://arxiv.org/abs/1905.12334. arXiv: 1905.12334.
- Mixed Precision Training. arXiv:1710.03740 [cs, stat], Feb. 2018. URL http://arxiv.org/abs/1710.03740. arXiv: 1710.03740.
- S. Migacz. 8-bit Inference with TensorRT, May 2017. URL https://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf.
- Flexblock: A flexible dnn training accelerator with multi-mode block floating point support, 2022. URL https://arxiv.org/abs/2203.06673.
- fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of NAACL-HLT 2019: Demonstrations, 2019.
- On the spectral bias of neural networks. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 5301–5310. PMLR, 2019. URL http://proceedings.mlr.press/v97/rahaman19a.html.
- Xnor-net: Imagenet classification using binary convolutional neural networks. In B. Leibe, J. Matas, N. Sebe, and M. Welling, editors, Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, volume 9908 of Lecture Notes in Computer Science, pages 525–542. Springer, 2016. doi: 10.1007/978-3-319-46493-0_32. URL https://doi.org/10.1007/978-3-319-46493-0_32.
- Fractional skipping: Towards finer-grained dynamic CNN inference. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 5700–5708. AAAI Press, 2020. URL https://ojs.aaai.org/index.php/AAAI/article/view/6025.
- Energy and policy considerations for deep learning in NLP. CoRR, abs/1906.02243, 2019. URL http://arxiv.org/abs/1906.02243.
- Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/hash/65fc9fb4897a89789352e211ca2d398f-Abstract.html.
- Ultra-Low Precision 4-bit Training of Deep Neural Networks. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1796–1807. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/13b919438259814cd5be8cb45877d577-Paper.pdf.
- The European Commission. Shaping Europe’s Digital Future. COM(2020) 67 final, Brussels, 2020. https://ec.europa.eu/info/sites/default/files/communication-shaping-europes-digital-future-feb2020_en_4.pdf.
- Attention is all you need. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 5998–6008, 2017. URL https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
- Training deep neural networks with 8-bit floating point numbers. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 7686–7695, Red Hook, NY, USA, 2018. Curran Associates Inc.
- S. Wang and P. Kanwar. BFloat16: The secret to high performance on Cloud TPUs, Aug. 2019.
- S. Xu and D. Gregg. Bitslice vectors: A software approach to customizable data precision on processors with simd extensions. In 2017 46th International Conference on Parallel Processing (ICPP), pages 442–451, 2017. doi: 10.1109/ICPP.2017.53.
- Frequency principle: Fourier analysis sheds light on deep neural networks. CoRR, abs/1901.06523, 2019. URL http://arxiv.org/abs/1901.06523.
- L. Yang and Q. Jin. Fracbits: Mixed precision quantization via fractional bit-widths. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 10612–10620. AAAI Press, 2021. URL https://ojs.aaai.org/index.php/AAAI/article/view/17269.
- Zeroquant: Efficient and affordable post-training quantization for large-scale transformers. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/adf7fa39d65e2983d724ff7da57f00ac-Abstract-Conference.html.
- Fast: Dnn training under variable precision block floating point with stochastic rounding. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pages 846–860. IEEE, 2022.
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. arXiv:1606.06160 [cs], Feb. 2018. arXiv: 1606.06160.
- Simla Burcu Harma (2 papers)
- Ayan Chakraborty (20 papers)
- Babak Falsafi (10 papers)
- Martin Jaggi (155 papers)
- Yunho Oh (3 papers)
- Nicholas Sperry (1 paper)