Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC (2401.02680v1)

Published 5 Jan 2024 in cs.DC and cs.PF

Abstract: Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU platforms. In that work, we acknowledged AMD's past effort such as HCC, which unfortunately is deprecated and does not support newer hardware platforms. Recent developments by AMD, Codeplay, and AdaptiveCpp (previously known as hipSYCL or OpenSYCL) have enabled multiple paths for StdPar programs to run on AMD GPUs. This informal report discusses our experiences and evaluation of currently available StdPar implementations for AMD GPUs. We conduct benchmarks using our suite of HPC mini-apps with ports in many heterogeneous programming models, including StdPar. We then compare the performance of StdPar, using all available StdPar compilers, to contemporary heterogeneous programming models supported on AMD GPUs: HIP, OpenCL, Thrust, Kokkos, OpenMP, SYCL. Where appropriate, we discuss issues encountered and workarounds applied during our evaluation. Finally, the StdPar model discussed in this report largely depends on Unified Shared Memory (USM) performance and very few AMD GPUs have proper support for this feature. As such, this report demonstrates a proof-of-concept host-side userspace pagefault solution for models that use the HIP API. We discuss performance improvements achieved with our solution using the same set of benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (4)
  1. W.-C. Lin, T. Deakin, and S. McIntosh-Smith, “Evaluating ISO C++ Parallel Algorithms on Heterogeneous HPC Systems,” in International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems held in conjunction with Supercomputing (PMBS), IEEE, 2022.
  2. J. D. McCalpin et al., “Memory bandwidth and machine balance in current high performance computers,” IEEE computer society technical committee on computer architecture (TCCA) newsletter, vol. 2, no. 19-25, 1995.
  3. T. Deakin, J. Price, M. Martineau, and S. McIntosh-Smith, “Evaluating attainable memory bandwidth of parallel programming models via BabelStream,” International Journal of Computational Science and Engineering, vol. 17, no. 3, pp. 247–262, 2018.
  4. W.-C. Lin and S. McIntosh-Smith, “Comparing julia to performance portable parallel programming models for hpc,” in 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pp. 94–105, 2021.

Summary

We haven't generated a summary for this paper yet.