Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads (2202.07848v2)

Published 16 Feb 2022 in cs.DC and cs.AI

Abstract: Lowering costs by driving high utilization across deep learning workloads is a crucial lever for cloud providers. We present Singularity, Microsoft's globally distributed scheduling service for highly-efficient and reliable execution of deep learning training and inference workloads. At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically scale deep learning workloads to drive high utilization without impacting their correctness or performance, across a global fleet of AI accelerators (e.g., GPUs, FPGAs). All jobs in Singularity are preemptable, migratable, and dynamically resizable (elastic) by default: a live job can be dynamically and transparently (a) preempted and migrated to a different set of nodes, cluster, data center or a region and resumed exactly from the point where the execution was preempted, and (b) resized (i.e., elastically scaled-up/down) on a varying set of accelerators of a given type. Our mechanisms are transparent in that they do not require the user to make any changes to their code or require using any custom libraries that may limit flexibility. Additionally, our approach significantly improves the reliability of deep learning workloads. We show that the resulting efficiency and reliability gains with Singularity are achieved with negligible impact on the steady-state performance. Finally, our design approach is agnostic of DNN architectures and handles a variety of parallelism strategies (e.g., data/pipeline/model parallelism).

Citations (15)

View on Semantic Scholar

Summary

The paper introduces Singularity, a scheduler that preemptively and elastically manages deep learning tasks to maximize global AI accelerator utilization.
It employs checkpointing, migration, and replica splicing, achieving less than 3% overhead while dynamically resizing jobs based on resource availability.
The evaluation shows robust performance across multiple ML frameworks, ensuring workload isolation and compliance with job-level SLAs.

"Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads"

Introduction

The paper "Singularity: Planet-Scale, Preemptive and Elastic Scheduling of AI Workloads" presents Microsoft's globally distributed scheduling system, Singularity, designed to efficiently execute deep learning workloads across a vast network of AI accelerators. Unlike traditional schedulers, Singularity is built to preemptively manage resource allocation across deep learning tasks, ensuring high utilization without compromising workload performance or correctness.

The primary objective of Singularity is to optimize the utilization of AI accelerators by dynamically adjusting the allocation of resources based on workload demands. It introduces mechanisms that make deep learning tasks preemptible, migratable, and dynamically resizable. These properties allow Singularity to allocate resources flexibly and efficiently at a planetary scale.

Key Mechanisms

Singularity hinges on two pivotal mechanisms: preemption and migration, and resizing or elasticity:

Preemption and Migration:
- Singularity can checkpoint, preempt, and migrate deep learning jobs across different nodes or regions without user intervention. It achieves a consistent cut of the distributed application state using a synchronization barrier. This ensures that the scheduling service can pause and relocate tasks seamlessly without affecting the execution state.
Elasticity (Resizing):
- The service allows jobs to scale up or down dynamically based on available resources. The concept of 'replica splicing' enables efficient memory usage by allowing time-slicing of multiple workers on a single processor, significantly reducing memory overheads typically associated with dynamic scaling.

Both these mechanisms are work-conserving and transparent, requiring no changes to user code. This transparency allows Singularity to handle jobs across various parallel strategies and DNN architectures without additional user configuration.

Implementation Considerations

Singularity's design elevates it above typical region-bound resource managers by treating all AI accelerators as a single logical cluster resource. The system ensures maximum utilization rates through:

Resource Sharing: By opportunistically leveraging idle resources globally, Singularity avoids static resource reservations and eliminates fragmentation.
Job-Level SLAs: Singularity respects job-level Service Level Agreements (SLAs) by dynamically adjusting resources as required, ensuring resource allocation efficiency while maintaining workload isolation.
Failure Resilience: By allowing tasks to resume from preemption points, rather than restarting from scratch, Singularity minimizes work loss due to hardware failures.

Singularity's architecture implements a device-proxy to decouple job execution from hardware resources, facilitating flexible resource mapping and enhancing system robustness to software updates and library changes.

Evaluation

The empirical evaluation of Singularity showcases low overhead for preemption and elasticity, with the time-slicing feature typically imposing less than 3% overhead on operations. Deployment across a spectrum of models and multiple versions of common ML frameworks, including different configurations of TensorFlow and PyTorch, demonstrate its robustness and flexible application.

Conclusion

Singularity advances AI workload scheduling by transforming niche elastic resources into standard features, thereby making all deep learning tasks seamlessly preemptible and resizable. The system aligns with modern cloud-based deep learning demands by promoting flexible, efficient use of computational resources while abstracting complexity away from end users. This innovation positions Singularity as a forward-thinking model for future distributed AI scheduling systems, laying the groundwork for increased throughput and reduced operational costs on a global scale.

Overall, Singularity signifies a shift towards universally elastic resource scheduling, facilitating both practical applications and future research into more adaptive AI system frameworks.