Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CROFT: A scalable three-dimensional parallel Fast Fourier Transform (FFT) implementation for High Performance Clusters (2002.04896v2)

Published 12 Feb 2020 in cs.DC and cs.PF

Abstract: The FFT of three-dimensional (3D) input data is an important computational kernel of numerical simulations and is widely used in High Performance Computing (HPC) codes running on a large number of processors. Performance of many scientific applications such as Molecular Dynamic simulations depends on the underlying 3D parallel FFT library being used. In this paper, we present C-DACs three-dimensional Fast Fourier Transform (CROFT) library which implements three-dimensional parallel FFT using pencil decomposition. To exploit the hyperthreading capabilities of processor cores without affecting performance, CROFT is designed to use multithreading along with MPI. CROFT implementation has an innovative feature of overlapping compute and memory-I/O with MPI communication using multithreading. As opposed to other 3D FFT implementations, CROFT uses only two threads where one thread is dedicated for communication so that it can be effectively overlapped with computations. Thus, depending on the number of processes used, CROFT achieves performance improvement of about 51% to 42% as compared to FFTW3 library.

Summary

We haven't generated a summary for this paper yet.