Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Accelerating High-Order Stencils on GPUs (2009.04619v2)

Published 10 Sep 2020 in cs.DC

Abstract: Stencil computations are widely used in HPC applications. Today, many HPC platforms use GPUs as accelerators. As a result, understanding how to perform stencil computations fast on GPUs is important. While implementation strategies for low-order stencils on GPUs have been well-studied in the literature, not all of proposed enhancements work well for high-order stencils, such as those used for seismic modeling. Furthermore, coping with boundary conditions often requires different computational logic, which complicates efficient exploitation of the thread-level parallelism on GPUs. In this paper, we study high-order stencils and their unique characteristics on GPUs. We manually crafted a collection of implementations of a 25-point seismic modeling stencil in CUDA and related boundary conditions. We evaluate their code shapes, memory hierarchy usage, data-fetching patterns, and other performance attributes. We conducted an empirical evaluation of these stencils using several mature and emerging tools and discuss our quantitative findings. Among our implementations, we achieve twice the performance of a proprietary code developed in C and mapped to GPUs using OpenACC. Additionally, several of our implementations have excellent performance portability.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ryuichi Sai (7 papers)
  2. John Mellor-Crummey (10 papers)
  3. Xiaozhu Meng (7 papers)
  4. Mauricio Araya-Polo (26 papers)
  5. Jie Meng (95 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.