Acceptability of TF32 Tensor Core Precision for NNP-based Molecular Dynamics

Determine whether using TF32 reduced-precision arithmetic on NVIDIA Tensor Cores introduces excessive numerical noise that compromises the physical accuracy and stability of neural network potential evaluations in molecular dynamics simulations, and, if necessary, characterize acceptable error bounds and mitigation strategies for such computations.

Background

Neural network potentials for biomolecular simulations can be implemented with architectures favoring matrix multiplications to leverage GPU tensor cores for speed. However, tensor cores use TF32 precision rather than FP32, potentially introducing additional numerical noise. The suitability of TF32 for physically accurate molecular dynamics remains uncertain, motivating an explicit assessment of its impact and potential error-control techniques.

References

So far, tensor cores arithmetics is based on a reduced floating point format (TF32) compared to FP32 of CUDA cores. It is unclear if the level of noise introduced is excessive for physics.

— Machine Learning Potentials: A Roadmap Toward Next-Generation Biomolecular Simulations (2408.12625 - Fabritiis, 17 Aug 2024) in Co-evolution of models, software, and hardware

Acceptability of TF32 Tensor Core Precision for NNP-based Molecular Dynamics

Background

References

Related Problems