Code Generation for Near-Roofline Finite Element Actions on GPUs from Symbolic Variational Forms
Abstract: We present a novel parallelization strategy for evaluating Finite Element Method (FEM) variational forms on GPUs, focusing on those that are expressible through the Unified Form Language (UFL) on simplex meshes. We base our approach on code transformations, wherein we construct a space of scheduling candidates and rank them via a heuristic cost model to effectively handle the large diversity of computational workloads that can be expressed in this way. We present a design of a search space to which the cost model is applied, along with an associated pruning strategy to limit the number of configurations that need to be empirically evaluated. The goal of our design is to strike a balance between the device's latency-hiding capabilities and the amount of state space, a key factor in attaining near-roofline performance. To make our work widely available, we have prototyped our parallelization strategy within the \textsc{Firedrake} framework, a UFL-based FEM solver. We evaluate the performance of our parallelization scheme on two generations of Nvidia GPUs, specifically the Titan V (Volta architecture) and Tesla K40c (Kepler architecture), across a range of operators commonly used in applications, including fluid dynamics, wave propagation, and structural mechanics, in 2D and 3D geometries. Our results demonstrate that our proposed algorithm achieves more than $50\%$ roofline performance in $65\%$ of the test cases on both devices.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.