Effects of Easy Hybrid Parallelization with CUDA for Numerical-Atomic-Orbital Density Functional Theory Calculation
Abstract: We modified a MPI-friendly density functional theory (DFT) source code within hybrid parallelization including CUDA. Our objective is to find out how simple conversions within the hybrid parallelization with mid-range GPUs affect DFT code not originally suitable to CUDA. We settled several rules of hybrid parallelization for numerical-atomic-orbital (NAO) DFT codes. The test was performed on a magnetite material system with OpenMX code by utilizing a hardware system containing 2 Xeon E5606 CPUs and 2 Quadro 4000 GPUs. 3-way hybrid routines obtained a speedup of 7.55 while 2-way hybrid speedup by 10.94. GPUs with CUDA complement the efficiency of OpenMP and compensate CPUs' excessive competition within MPI.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.