Cause of idle time between tasks in the GPU-accelerated SWIFT solver

Determine the precise cause of the significant idle time observed between tasks in the GPU-accelerated implementation of the SWIFT smoothed particle hydrodynamics solver, specifically identifying how the QuickSched task scheduler, task bundling parameters (Sp and Sb), dependency unlocking, and pack/unpack operations contribute to host threads being unable to retrieve tasks from their queues during a time step, and characterize the conditions under which this idle time arises.

Background

The paper introduces a GPU-accelerated approach for SWIFT, a task-parallel smoothed particle hydrodynamics solver, where compute-intensive density, gradient, and force tasks are offloaded to the GPU while task management and memory-bound tasks remain on the CPU. To maximize GPU efficiency, the authors bundle tasks and use asynchronous instruction streams to overlap CPU-GPU data transfers with computations.

In a full simulation on the Nvidia Grace-Hopper superchip, the authors observe substantial idle time in-between tasks for the GPU-accelerated code compared to the original CPU-only version. They attribute some of this behavior to the locking of dependent tasks until GPU results are transferred back, which can leave task queues empty, but the exact underlying cause of the idle periods is explicitly noted as unclear. Resolving this unknown would help improve end-to-end performance and realize larger speedups.

References

It is currently unclear exactly what is causing this idle time in-between tasks but what is clear is that some threads are unable to retrieve tasks form their queues.

— Task-parallelism in SWIFT for heterogeneous compute architectures (2505.14538 - Nasar et al., 20 May 2025) in Section 4.2

Cause of idle time between tasks in the GPU-accelerated SWIFT solver

Sponsor

Background

References

Related Problems