Standalone 16-bit Training: Missing Study for Hardware-Limited Deep Learning Practitioners (2305.10947v4)

Published 18 May 2023 in cs.LG, cs.AI, cs.CV, and cs.PF

Abstract: With the increasing complexity of machine learning models, managing computational resources like memory and processing power has become a critical concern. Mixed precision techniques, which leverage different numerical precisions during model training and inference to optimize resource usage, have been widely adopted. However, access to hardware that supports lower precision formats (e.g., FP8 or FP4) remains limited, especially for practitioners with hardware constraints. For many with limited resources, the available options are restricted to using 32-bit, 16-bit, or a combination of the two. While it is commonly believed that 16-bit precision can achieve results comparable to full (32-bit) precision, this study is the first to systematically validate this assumption through both rigorous theoretical analysis and extensive empirical evaluation. Our theoretical formalization of floating-point errors and classification tolerance provides new insights into the conditions under which 16-bit precision can approximate 32-bit results. This study fills a critical gap, proving for the first time that standalone 16-bit precision neural networks match 32-bit and mixed-precision in accuracy while boosting computational speed. Given the widespread availability of 16-bit across GPUs, these findings are especially valuable for machine learning practitioners with limited hardware resources to make informed decisions.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (5)

Juyoung Yun (15 papers)
Byungkon Kang (8 papers)
Francois Rameau (23 papers)
Zhoulai Fu (7 papers)
Sol Choi (2 papers)

Tweets

https://twitter.com/niplav_site/status/1849113217659744502

Standalone 16-bit Training: Missing Study for Hardware-Limited Deep Learning Practitioners (2305.10947v4)

Related Papers

Tweets