Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 58 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 33 tok/s Pro

GPT-4o 115 tok/s Pro

Kimi K2 183 tok/s Pro

GPT OSS 120B 462 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

Kernel Fusion in Atomistic Spin Dynamics Simulations on Nvidia GPUs using Tensor Core (2308.07487v1)

Published 14 Aug 2023 in physics.comp-ph

Abstract: In atomistic spin dynamics simulations, the time cost of constructing the space- and time-displaced pair correlation function in real space increases quadratically as the number of spins $N$, leading to significant computational effort. The GEMM subroutine can be adopted to accelerate the calculation of the dynamical spin-spin correlation function, but the computational cost of simulating large spin systems ($>40000$ spins) on CPUs remains expensive. In this work, we perform the simulation on the graphics processing unit (GPU), a hardware solution widely used as an accelerator for scientific computing and deep learning. We show that GPUs can accelerate the simulation up to 25-fold compared to multi-core CPUs when using the GEMM subroutine on both. To hide memory latency, we fuse the element-wise operation into the GEMM kernel using $\mathtt{CUTLASS}$ that can improve the performance by 26% $\sim$ 33% compared to implementation based on $\mathtt{cuBLAS}$. Furthermore, we perform the on-the-fly calculation in the epilogue of the GEMM subroutine to avoid saving intermediate results on global memory, which makes the large-scale atomistic spin dynamics simulation feasible and affordable.