A Nested Partitioning Scheme for Parallel Heterogeneous Clusters (1307.4731v1)
Abstract: Modern supercomputers are increasingly requiring the presence of accelerators and co-processors. However, it has not been easy to achieve good performance on such heterogeneous clusters. The key challenge has been to ensure good load balance and that neither the CPU nor the accelerator is left idle. Traditional approaches have offloaded entire computations to the accelerator, resulting in an idle CPU, or have opted for task-level parallelism requiring large data transfers between the CPU and the accelerator. True work-parallelism has been hard as the Accelerators cannot directly communicate with other CPUs (besides the host) and Accelerators. In this work, we present a new nested partition scheme to overcome this problem. By partitioning the work assignment on a given node asymmetrically into boundary and interior work, and assigning the interior to the accelerator, we are able to achieve excellent efficiency while ensure proper utilization of both the CPU and Accelerator resources. The problem used for evaluating the new partition is an $hp$ discontinuous Galerkin spectral element method for a coupled elastic--acoustic wave propagation problem.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.