Multilevel Interior Penalty Methods on GPUs (2405.18982v2)
Abstract: We present a matrix-free multigrid method for high-order discontinuous Galerkin (DG) finite element methods with GPU acceleration. A performance analysis is conducted, comparing various data and compute layouts. Smoother implementations are optimized through localization and fast diagonalization techniques. Leveraging conflict-free access patterns in shared memory, arithmetic throughput of up to 39% of the peak performance on Nvidia A100 GPUs are achieved. Experimental results affirm the effectiveness of mixed-precision approaches and MPI parallelization in accelerating algorithms. Furthermore, an assessment of solver efficiency and robustness is provided across both two and three dimensions, with applications to Poisson problems.
- The deal.II Library, Version 9.5. Journal of Numerical Mathematics 31, 3 (2023), 231–246. https://dealii.org/deal95-preprint.pdf
- Douglas N Arnold. 1982. An interior penalty finite element method with discontinuous elements. SIAM journal on numerical analysis 19, 4 (1982), 742–760.
- Preconditioning in H(div)𝐻divH({\rm div})italic_H ( roman_div ) and applications. Math. Comput. 66, 219 (1997), 957–984.
- Multigrid in H(div)𝐻divH({\rm div})italic_H ( roman_div ) and H(curl)𝐻curlH({\rm curl})italic_H ( roman_curl ). Numer. Math. 85, 2 (2000), 197–217.
- Algorithms and data structures for massively parallel generic adaptive finite element codes. ACM Trans. Math. Softw. 38 (2011), 14/1–28.
- CEED ECP milestone report: Identify initial kernels, bake-off problems (benchmarks) and miniapps. In Tech. Rep. US Department of Energy, USA.
- p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees. SIAM Journal on Scientific Computing 33, 3 (2011), 1103–1133.
- An implementation of tensor product patch smoothers on GPU.
- High-order matrix-free incompressible flow solvers with GPU acceleration and low-order refined preconditioners. Computers & Fluids 203 (2020), 104541.
- Performance and accuracy of hardware-oriented native-, emulated-and mixed-precision solvers in FEM simulations. International Journal of Parallel, Emergent and Distributed Systems 22, 4 (2007), 221–256.
- Jay Gopalakrishnan and Guido Kanschat. 2003. A multilevel discontinuous Galerkin method. Numer. Math. 95 (2003), 527–550.
- Performance modeling and tuning of an unstructured mesh CFD application. In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing. IEEE Computer Society, USA, 34–es.
- G Kanschat. 2003. Discontinuous Galerkin finite element methods for advection-diffusion problems. Ph. D. Dissertation. Habilitationsschrift, Universität Heidelberg.
- Guido Kanschat. 2008. Robust smoothers for high-order discontinuous Galerkin discretizations of advection–diffusion problems. J. Comput. Appl. Math. 218, 1 (2008), 53–60.
- Geometric multigrid for Darcy and Brinkman models of flows in highly heterogeneous porous media: A numerical study. J. Comput. Appl. Math. 310 (2017), 174–185.
- Guido Kanschat and Youli Mao. 2015. Multigrid methods for Hdiv-conforming discontinuous Galerkin methods for the Stokes equations. Journal of Numerical Mathematics 23, 1 (2015), 51–66.
- Nodal discontinuous Galerkin methods on graphics processors. J. Comput. Phys. 228, 21 (2009), 7863–7882.
- High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster. Journal of computational physics 229, 20 (2010), 7692–7714.
- Martin Kronbichler and Katharina Kormann. 2019. Fast matrix-free evaluation of discontinuous Galerkin finite element operators. ACM Transactions on Mathematical Software (TOMS) 45, 3 (2019), 29.
- Martin Kronbichler and Karl Ljungkvist. 2019. Multigrid for matrix-free high-order finite element computations on graphics processors. ACM Transactions on Parallel Computing (TOPC) 6, 1 (2019), 1–32.
- Karl Ljungkvist. 2017. Matrix-free finite-element computations on graphics processors with adaptively refined unstructured meshes.. In SpringSim (HPC). Society for Computer Simulation International, San Diego, CA, USA, 1–1.
- Direct solution of partial difference equations by tensor product methods. Numer. Math. 6 (1964), 185–199.
- GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models. Computers & Geosciences 91 (2016), 64–76.
- NVIDIA Corporation. 2023a. CUDA C++ Programming Guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
- NVIDIA Corporation. 2023b. Nsight Compute. https://docs.nvidia.com/nsight-compute/index.html
- Steven A Orszag. 1980. Spectral methods for problems in complex geometries. J. Comput. Phys. 37, 1 (1980), 70–92.
- Anthony T Patera. 1984. A spectral element method for fluid dynamics: laminar flow in a channel expansion. Journal of computational Physics 54, 3 (1984), 468–488.
- Will Pazner and Per-Olof Persson. 2018. Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods. J. Comput. Phys. 354 (2018), 344–369.
- GPU accelerated spectral finite elements on all-hex meshes. J. Comput. Phys. 324 (2016), 246–257.
- Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors. IEEE Transactions on Parallel and Distributed Systems 34, 1 (2022), 246–261.
- Acceleration of tensor-product operations for high-order finite element methods. The International Journal of High Performance Computing Applications 33, 4 (2019), 735–757.
- On the utility of GPU accelerated high-order methods for unsteady flow simulations: A comparison with industry-standard tools. J. Comput. Phys. 334 (2017), 497–521.
- From h to p efficiently: Implementing finite and spectral/hp element methods to achieve optimal performance for low-and high-order discretisations. J. Comput. Phys. 229, 13 (2010), 5161–5181.
- Smoothers with localized residual computations for geometric multigrid methods.
- Fast tensor product Schwarz smoothers for high-order discontinuous Galerkin methods. Computational Methods in Applied Mathematics 21, 3 (2021), 709–728.