Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multilevel Interior Penalty Methods on GPUs (2405.18982v2)

Published 29 May 2024 in math.NA and cs.NA

Abstract: We present a matrix-free multigrid method for high-order discontinuous Galerkin (DG) finite element methods with GPU acceleration. A performance analysis is conducted, comparing various data and compute layouts. Smoother implementations are optimized through localization and fast diagonalization techniques. Leveraging conflict-free access patterns in shared memory, arithmetic throughput of up to 39% of the peak performance on Nvidia A100 GPUs are achieved. Experimental results affirm the effectiveness of mixed-precision approaches and MPI parallelization in accelerating algorithms. Furthermore, an assessment of solver efficiency and robustness is provided across both two and three dimensions, with applications to Poisson problems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. The deal.II Library, Version 9.5. Journal of Numerical Mathematics 31, 3 (2023), 231–246. https://dealii.org/deal95-preprint.pdf
  2. Douglas N Arnold. 1982. An interior penalty finite element method with discontinuous elements. SIAM journal on numerical analysis 19, 4 (1982), 742–760.
  3. Preconditioning in H⁢(div)𝐻divH({\rm div})italic_H ( roman_div ) and applications. Math. Comput. 66, 219 (1997), 957–984.
  4. Multigrid in H⁢(div)𝐻divH({\rm div})italic_H ( roman_div ) and H⁢(curl)𝐻curlH({\rm curl})italic_H ( roman_curl ). Numer. Math. 85, 2 (2000), 197–217.
  5. Algorithms and data structures for massively parallel generic adaptive finite element codes. ACM Trans. Math. Softw. 38 (2011), 14/1–28.
  6. CEED ECP milestone report: Identify initial kernels, bake-off problems (benchmarks) and miniapps. In Tech. Rep. US Department of Energy, USA.
  7. p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees. SIAM Journal on Scientific Computing 33, 3 (2011), 1103–1133.
  8. An implementation of tensor product patch smoothers on GPU.
  9. High-order matrix-free incompressible flow solvers with GPU acceleration and low-order refined preconditioners. Computers & Fluids 203 (2020), 104541.
  10. Performance and accuracy of hardware-oriented native-, emulated-and mixed-precision solvers in FEM simulations. International Journal of Parallel, Emergent and Distributed Systems 22, 4 (2007), 221–256.
  11. Jay Gopalakrishnan and Guido Kanschat. 2003. A multilevel discontinuous Galerkin method. Numer. Math. 95 (2003), 527–550.
  12. Performance modeling and tuning of an unstructured mesh CFD application. In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing. IEEE Computer Society, USA, 34–es.
  13. G Kanschat. 2003. Discontinuous Galerkin finite element methods for advection-diffusion problems. Ph. D. Dissertation. Habilitationsschrift, Universität Heidelberg.
  14. Guido Kanschat. 2008. Robust smoothers for high-order discontinuous Galerkin discretizations of advection–diffusion problems. J. Comput. Appl. Math. 218, 1 (2008), 53–60.
  15. Geometric multigrid for Darcy and Brinkman models of flows in highly heterogeneous porous media: A numerical study. J. Comput. Appl. Math. 310 (2017), 174–185.
  16. Guido Kanschat and Youli Mao. 2015. Multigrid methods for Hdiv-conforming discontinuous Galerkin methods for the Stokes equations. Journal of Numerical Mathematics 23, 1 (2015), 51–66.
  17. Nodal discontinuous Galerkin methods on graphics processors. J. Comput. Phys. 228, 21 (2009), 7863–7882.
  18. High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster. Journal of computational physics 229, 20 (2010), 7692–7714.
  19. Martin Kronbichler and Katharina Kormann. 2019. Fast matrix-free evaluation of discontinuous Galerkin finite element operators. ACM Transactions on Mathematical Software (TOMS) 45, 3 (2019), 29.
  20. Martin Kronbichler and Karl Ljungkvist. 2019. Multigrid for matrix-free high-order finite element computations on graphics processors. ACM Transactions on Parallel Computing (TOPC) 6, 1 (2019), 1–32.
  21. Karl Ljungkvist. 2017. Matrix-free finite-element computations on graphics processors with adaptively refined unstructured meshes.. In SpringSim (HPC). Society for Computer Simulation International, San Diego, CA, USA, 1–1.
  22. Direct solution of partial difference equations by tensor product methods. Numer. Math. 6 (1964), 185–199.
  23. GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models. Computers & Geosciences 91 (2016), 64–76.
  24. NVIDIA Corporation. 2023a. CUDA C++ Programming Guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
  25. NVIDIA Corporation. 2023b. Nsight Compute. https://docs.nvidia.com/nsight-compute/index.html
  26. Steven A Orszag. 1980. Spectral methods for problems in complex geometries. J. Comput. Phys. 37, 1 (1980), 70–92.
  27. Anthony T Patera. 1984. A spectral element method for fluid dynamics: laminar flow in a channel expansion. Journal of computational Physics 54, 3 (1984), 468–488.
  28. Will Pazner and Per-Olof Persson. 2018. Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods. J. Comput. Phys. 354 (2018), 344–369.
  29. GPU accelerated spectral finite elements on all-hex meshes. J. Comput. Phys. 324 (2016), 246–257.
  30. Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors. IEEE Transactions on Parallel and Distributed Systems 34, 1 (2022), 246–261.
  31. Acceleration of tensor-product operations for high-order finite element methods. The International Journal of High Performance Computing Applications 33, 4 (2019), 735–757.
  32. On the utility of GPU accelerated high-order methods for unsteady flow simulations: A comparison with industry-standard tools. J. Comput. Phys. 334 (2017), 497–521.
  33. From h to p efficiently: Implementing finite and spectral/hp element methods to achieve optimal performance for low-and high-order discretisations. J. Comput. Phys. 229, 13 (2010), 5161–5181.
  34. Smoothers with localized residual computations for geometric multigrid methods.
  35. Fast tensor product Schwarz smoothers for high-order discontinuous Galerkin methods. Computational Methods in Applied Mathematics 21, 3 (2021), 709–728.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com