Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mitigating inefficient task mappings with an Adaptive Resource-Moldable Scheduler (ARMS) (2112.09509v1)

Published 17 Dec 2021 in cs.DC and cs.PF

Abstract: Efficient runtime task scheduling on complex memory hierarchy becomes increasingly important as modern and future High-Performance Computing (HPC) systems are progressively composed of multisocket and multi-chiplet nodes with nonuniform memory access latencies. Existing locality-aware scheduling schemes either require control of the data placement policy for memory-bound tasks or maximize locality for all classes of computations, resulting in a loss of potential performance. While such approaches are viable, an adaptive scheduling strategy is preferred to enhance locality and resource sharing efficiency using a portable programming scheme. In this paper, we propose the Adaptive Resource-Moldable Scheduler (ARMS) that dynamically maps a task at runtime to a partition spanning one or more threads, based on the task and DAG requirements. The scheduler builds an online platform-independent model for the local and non-local scheduling costs for each tuple consisting of task type (function) and task topology (task location within DAG). We evaluate ARMS using task-parallel versions of SparseLU, 2D Stencil, FMM, and MatMul as examples. Compared to previous approaches, ARMS achieves up to 3.5x performance gain over state-of-the-art locality-aware scheduling schemes.

Summary

We haven't generated a summary for this paper yet.