Papers
Topics
Authors
Recent
2000 character limit reached

Towards a Higher Roofline for Matrix-Vector Multiplication in Matrix-Free HOSFEM (2504.07042v2)

Published 9 Apr 2025 in cs.PF and cs.MS

Abstract: The high-order/spectral finite element method (HOSFEM) is a widely used numerical method for solving PDEs, with its performance primarily relying on axhelm, a matrix-free kernel for element-local matrix-vector multiplications. In axhelm, geometric factors account for over half of memory access but minimally contribute to computational workload. This imbalance significantly constrains the performance roofline, indicating that further optimization of tensor contraction, the core computation in axhelm, yields only minimal improvements. To overcome this bottleneck, we propose a low-cost on-the-fly recalculation of geometric factors for trilinear elements, thereby unlocking substantial potential for optimizing tensor contraction. The proposed approach is implemented in Nekbone, a standard HOSFEM benchmark. With optimizations such as merging scalar factors, partial recalculation, Tensor Core acceleration, and constant memory utilization, performance reaches 85%-100% of the higher roofline. The optimized kernels achieve speedups of 1.74x-4.10x on NVIDIA A100 and 1.99x-3.77x on DCU K100. This leads to a 1.12x-1.40x speedup for Nekbone.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 2 likes about this paper.