Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization (2203.07820v2)

Published 15 Mar 2022 in physics.comp-ph

Abstract: We present DFT-FE 1.0, building on DFT-FE 0.6 [Comput. Phys. Commun. 246, 106853 (2020)], to conduct fast and accurate large-scale density functional theory (DFT) calculations (reaching ~ $100,000$ electrons) on both many-core CPU and hybrid CPU-GPU computing architectures. This work involves improvements in the real-space formulation -- via an improved treatment of the electrostatic interactions that substantially enhances the computational efficiency -- as well high-performance computing aspects, including the GPU acceleration of all the key compute kernels in DFT-FE. We demonstrate the accuracy by comparing the ground-state energies, ionic forces and cell stresses on a wide-range of benchmark systems against those obtained from widely used DFT codes. Further, we demonstrate the numerical efficiency of our implementation, which yields $\sim 20 \times$ CPU-GPU speed-up by using GPU acceleration on hybrid CPU-GPU nodes. Notably, owing to the parallel-scaling of the GPU implementation, we obtain wall-times of $80-140$ seconds for full ground-state calculations, with stringent accuracy, on benchmark systems containing ~ $6,000-15,000$ electrons.

Summary

We haven't generated a summary for this paper yet.