ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs (2404.16812v1)

Published 25 Apr 2024 in cs.DC

Abstract: Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some important factors, leaving some large performance potential locked. This paper presents ESG, a new scheduling algorithm that directly addresses the difficulties. ESG treats sharable GPU as a first-order factor in scheduling. It employs an optimality-guided adaptive method by combining A*-search and a novel dual-blade pruning to dramatically prune the scheduling space without compromising the quality. It further introduces a novel method, dominator-based SLO distribution, to ensure the scalability of the scheduler. The results show that ESG can significantly improve the SLO hit rates 61%-80% while saving 47%-187% costs over prior work.

Citations (1)

View on Semantic Scholar

Summary

The paper presents ESG, a pipeline-conscious scheduling algorithm that enhances SLO hit rates by 61–80% while reducing costs by up to 187% through efficient GPU sharing.
It employs a dual-bladed A*-search strategy and Application-Function-Wise job queues to optimize configuration spaces and minimize cross-function communication overhead.
ESG_Dispatch dynamically assigns jobs based on data locality and resource availability, resulting in lower latency and improved resource utilization in heterogeneous environments.

ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

Introduction

The paper introduces a novel scheduling algorithm, ESG, designed to improve the efficiency and cost-effectiveness of DNN workflows on serverless platforms that utilize shareable GPUs. This work addresses the significant challenges associated with scheduling in heterogeneous computing environments by treating shareable GPUs as primary scheduling factors. ESG employs advanced search techniques to optimize the resource allocation and SLO (Service Level Objective) fulfillment, significantly outperforming existing methods.

System Architecture and Design

ESG is implemented within the serverless framework of OpenWhisk, which serves as the underlying architecture for function execution. OpenWhisk is a distributed platform that automates resource scaling and task scheduling. The Controller component, which handles task scheduling and resource allocation, is central to ESG's functionality.

Figure 1: OpenWhisk architecture. Controller is where scheduling happens.

To address the inherent complexity of serverless environments augmented by GPU sharing, ESG introduces Application-Function-Wise (AFW) job queues. This mechanism allows for precise resource allocation by grouping jobs that share the same function and application context. ESG's scheduling strategy is driven by workflow optimizations that minimize cross-function communication overhead and adaptively configure resource allocations.

ESG Algorithm

ESG consists of two core algorithms: ESG_1Q and ESG_Dispatch.

ESG_1Q Algorithm

ESG_1Q constructs scheduling configurations using an optimality-guided approach inspired by A*-search, enhanced with a dual-bladed pruning strategy. This enhancement expedites the exploration of potential configurations by eliminating suboptimal paths early in the search process.

Figure 2: The app-func-wise (AFW) job queues of two example ML-based applications, and the ESG algorithm workflow in handling one job queue.

By leveraging dual-blade pruning, ESG_1Q effectively manages the vast configuration spaces introduced by the combination of batch sizes, vCPU, and vGPU assignments, thereby aligning execution timings closely with SLO requirements and minimizing resource utilization.

Figure 3: (a) Top: Example of the configuration space of a three-function application and two configuration paths in the space.

ESG_Dispatch Algorithm

ESG_Dispatch manages the dynamic assignment of jobs to worker nodes, prioritizing data locality and resource availability to minimize latency. It selectively attempts to execute functions on machines already containing prerequisite data to reduce data transfer times. This intelligent dispatch approach enhances performance by leveraging local data cache coherency.

Performance Evaluation

Extensive evaluations across multiple workloads demonstrate ESG's capability to sustain high SLO hit rates while using fewer resources than state-of-the-art methods. ESG consistently reported SLO hit improvements ranging from 61% to 80% with cost savings between 47% to 187% compared to baseline techniques.

Figure 4: The average SLO hit rate and the cost (normalized to ESG cost) under different SLO and workload settings. The left y-axis is for the SLO hit rate and the higher is better. The right y-axis is for the cost and the lower is better.

Implementation Considerations

When deploying ESG, the architecture should ensure that computing nodes are equipped with hardware capable of supporting GPU partitioning techniques (e.g., NVIDIA MIG) to fully exploit GPU sharing capabilities. The algorithm's scalability is maintained through a dominator-based method that handles function grouping within workflow DAGs. This requires careful configuration to balance function stage sizes against real-time execution constraints.

Conclusion

ESG advances serverless scheduling practices by integrating GPU sharing directly into the optimization process, balancing efficiency and cost without compromising the quality of service. The algorithm's dynamic adaptation to resource status and workload variations makes it particularly suitable for large-scale ML inference tasks in heterogeneous cloud environments. As serverless computing continues to mature, ESG's innovations will form a foundation for subsequent developments in adaptive, resource-aware scheduling solutions.