FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs (2412.20796v2)

Published 30 Dec 2024 in cs.DC and cs.LG

Abstract: Graph neural network universal interatomic potentials (GNN-UIPs) have demonstrated remarkable generalization and transfer capabilities in material discovery and property prediction. These models can accelerate molecular dynamics (MD) simulation by several orders of magnitude while maintaining \textit{ab initio} accuracy, making them a promising new paradigm in material simulations. One notable example is Crystal Hamiltonian Graph Neural Network (CHGNet), pretrained on the energies, forces, stresses, and magnetic moments from the MPtrj dataset, representing a state-of-the-art GNN-UIP model for charge-informed MD simulations. However, training the CHGNet model is time-consuming(8.3 days on one A100 GPU) for three reasons: (i) requiring multi-layer propagation to reach more distant atom information, (ii) requiring second-order derivatives calculation to finish weights updating and (iii) the implementation of reference CHGNet does not fully leverage the computational capabilities. This paper introduces FastCHGNet, an optimized CHGNet, with three contributions: Firstly, we design innovative Force/Stress Readout modules to decompose Force/Stress prediction. Secondly, we adopt massive optimizations such as kernel fusion, redundancy bypass, etc, to exploit GPU computation power sufficiently. Finally, we extend CHGNet to support multiple GPUs and propose a load-balancing technique to enhance GPU utilization. Numerical results show that FastCHGNet reduces memory footprint by a factor of 3.59. The final training time of FastCHGNet can be decreased to \textbf{1.53 hours} on 32 GPUs without sacrificing model accuracy.

Summary

The paper introduces a novel decoupling of force and stress prediction that eliminates expensive second-order derivative computations.
It implements advanced GPU optimization techniques, including kernel fusion and load-balancing, to efficiently harness 32 GPUs.
The study achieves a 130x speedup by reducing training time from 8.3 days to 1.5 hours, significantly advancing molecular dynamics simulations.

An Overview of FastCHGNet: Accelerating Universal Interatomic Potential Training

The paper "FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs" presents an optimized approach to efficiently train Graph Neural Network Universal Interatomic Potentials (GNN-UIPs), particularly focusing on the Crystal Hamiltonian Graph Neural Network (CHGNet). This research is pivotal in advancing molecular dynamics (MD) simulations where GNN-UIPs have demonstrated potential in accurately modeling material properties across diverse systems without further density functional theory (DFT) calculations.

Key Contributions and Methodological Innovations

The authors identify several inefficiencies in the standard CHGNet training process and introduce FastCHGNet to address these. Key optimizations include:

Force/Stress Prediction Decoupling: FastCHGNet abandons the traditional computation of second-order derivatives for force and stress predictions. Instead, it integrates novel Force/Stress Readout modules that predict these properties directly. This approach significantly decreases computational overhead while maintaining the model's accuracy.
GPU Utilization Enhancements: The paper details a host of optimizations aimed at maximizing GPU efficiency. These include kernel fusion, redundancy elimination, and parallelizing the basis computation, thereby reducing unnecessary computations and improving memory usage.
Multi-GPU Scalability: FastCHGNet scales across multiple GPUs using a load-balancing strategy to ensure even workload distribution, overcoming the inherent batch variability in molecular simulation data.

Numerical Results and Performance

The authors have reported substantial improvements in training times. FastCHGNet achieves a remarkable 130x speedup, reducing the CHGNet training time from 8.3 days to 1.53 hours on 32 GPUs, without compromising on model accuracy. This improvement is significant for iterative model development and deployment in real-world applications.

Implications and Future Prospects

The optimizations introduced in FastCHGNet set a precedent for scaling complex GNN models in material science applications. The decoupling of property predictions from derivative calculations could encourage similar methodologies across other domains of computational modeling. FastCHGNet exemplifies how system architecture adjustments and algorithmic innovations can coexist to enhance computational efficiency, opening doors for more extensive and frequent simulations in materials discovery.

Looking forward, the research could inspire further exploration into lightweight neural architectures for computational chemistry. There is potential for cross-pollination with fields like quantum chemistry and condensed matter physics, where high-dimensional potential energy surfaces are pivotal. Additionally, efforts could be made to extend quantization and model compression techniques to further expedite training and inference, enhancing the applicability of universal interatomic potentials in real-time applications.