Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distributed, combined CPU and GPU profiling within HPX using APEX (2210.06437v1)

Published 21 Sep 2022 in cs.DC

Abstract: Benchmarking and comparing performance of a scientific simulation across hardware platforms is a complex task. When the simulation in question is constructed with an asynchronous, many-task (AMT) runtime offloading work to GPUs, the task becomes even more complex. In this paper, we discuss the use of a uniquely suited performance measurement library, APEX, to capture the performance behavior of a simulation built on HPX, a highly scalable, distributed AMT runtime. We examine the performance of the astrophysics simulation carried-out by Octo-Tiger on two different supercomputing architectures. We analyze the results of scaling and measurement overheads. In addition, we look in-depth at two similarly configured executions on the two systems to study how architectural differences affect performance and identify opportunities for optimization. As one such opportunity, we optimize the communication for the hydro solver and investigated its performance impact.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Patrick Diehl (41 papers)
  2. Kevin Huck (13 papers)
  3. Dominic Marcello (10 papers)
  4. Sagiv Shiber (15 papers)
  5. Hartmut Kaiser (44 papers)
  6. Juhan Frank (19 papers)
  7. Geoffrey C. Clayton (56 papers)
  8. Dirk Pflueger (1 paper)
  9. Gregor Daiss (3 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.