Fusion of FFT and GEMM kernels

Investigate and develop effective kernel fusion techniques that integrate the Fast Fourier Transform (FFT) and General Matrix Multiplication (GEMM) operations despite their mismatched data access patterns and memory layouts, enabling efficient end-to-end execution in workflows such as Fourier Neural Operators.

Background

The paper studies the common FFT → GEMM → iFFT motif found in scientific computing and Fourier Neural Operators (FNO), noting that existing implementations typically call cuFFT and cuBLAS separately, incurring redundant memory transfers and kernel launches. Although kernel fusion is widely used in modern deep learning systems, prior efforts have focused on FFT–convolution and FFT–stencil pipelines where dataflow alignment is more natural.

The authors emphasize that fusing FFT and GEMM is fundamentally challenging because the two algorithms exhibit mismatched data access patterns and memory layouts. They propose TurboFNO as a fused FFT-GEMM-iFFT kernel tailored to FNO, but explicitly recognize that, in general, the problem of fusing FFT and GEMM remains an open area needing further research and methods.

References

In contrast, fusing FFT and GEMM presents unique challenges due to their mismatched data access patterns and memory layouts, and remains an open area of research.

TurboFNO: High-Performance Fourier Neural Operator with Fused FFT-GEMM-iFFT on GPU (2504.11681 - Wu et al., 16 Apr 2025) in Section 1, Introduction