TTDFT: A GPU accelerated Tucker tensor DFT code for large-scale Kohn-Sham DFT calculations (2110.15853v1)
Abstract: We present the Tucker tensor DFT (TTDFT) code which uses a tensor-structured algorithm with graphic processing unit (GPU) acceleration for conducting ground-state DFT calculations on large-scale systems. The Tucker tensor DFT algorithm uses a localized Tucker tensor basis computed from an additive separable approximation to the Kohn-Sham Hamiltonian. The discrete Kohn-Sham problem is solved using Chebyshev filtering subspace iteration method that relies on matrix-matrix multiplications of a sparse symmetric Hamiltonian matrix and a dense wavefunction matrix, expressed in the localized Tucker tensor basis. These matrix-matrix multiplication operations, which constitute the most computationally intensive step of the solution procedure, are GPU accelerated providing ~8-fold GPU-CPU speedup for these operations on the largest systems studied. The computational performance of the TTDFT code is presented using benchmark studies on aluminum nano-particles and silicon quantum dots with system sizes ranging up to ~7,000 atoms.