Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Themis: Fair and Efficient GPU Cluster Scheduling (1907.01484v2)

Published 2 Jul 2019 in cs.DC

Abstract: Modern distributed ML training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs have long-running tasks that need to be gang-scheduled, and their performance is sensitive to tasks' relative placement. We propose Themis, a new scheduling framework for ML training workloads. It's GPU allocation policy enforces that ML workloads complete in a finish-time fair manner, a new notion we introduce. To capture placement sensitivity and ensure efficiency, Themis uses a two-level scheduling architecture where ML workloads bid on available resources that are offered in an auction run by a central arbiter. Our auction design allocates GPUs to winning bids by trading off efficiency for fairness in the short term but ensuring finish-time fairness in the long term. Our evaluation on a production trace shows that Themis can improve fairness by more than 2.25X and is ~5% to 250% more cluster efficient in comparison to state-of-the-art schedulers.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Kshiteej Mahajan (4 papers)
  2. Arjun Balasubramanian (4 papers)
  3. Arjun Singhvi (6 papers)
  4. Shivaram Venkataraman (48 papers)
  5. Aditya Akella (44 papers)
  6. Amar Phanishayee (23 papers)
  7. Shuchi Chawla (50 papers)
Citations (167)

Summary

Themis: Fair and Efficient GPU Cluster Scheduling

The paper introduces Themis, a GPU cluster scheduling framework geared towards optimizing fairness and efficiency for ML workloads. In contemporary ML environments, GPUs are critical resources due to their ability to handle high-dimensional data required for training complex models. However, contention issues arise when multiple ML workloads share a clustered GPU environment. Themis was developed to address this contention, with a focus on ensuring fairness in GPU allocation—a critical concern for both users and operators.

Themis redefines fairness through the novel concept of finish-time fairness, which ensures that workloads complete equitably in shared environments compared to dedicated clusters. A distinct characteristic of Themis is its two-level scheduling architecture: this includes an auction-based GPU allocation system overseen by a central arbiter. This structure contrasts with conventional systems like DRF and Tiresias by acknowledging the interplay between resource allocation and placement, thereby enabling better scheduling for long-duration tasks and placement-sensitive workloads.

The evaluation of Themis on realistic traces shows compelling improvements in fairness and efficiency. Themis is shown to enhance fairness by over 2.25 times compared to existing state-of-the-art schedulers, displaying a 5–250% increase in cluster efficiency. These results highlight Themis’s advanced capability to provide a fairer allocation of resources without compromising on cluster throughput or job completion time.

Implications and Future Directions

Practical Implications: Themis's introduction of finish-time fairness and its auction-based resource allocation have profound implications for ML scheduling in shared environments. By offering equitable resource distribution, Themis provides a concrete solution to the prevalent problem of resource hoarding, thus minimizing user frustration and potentially reducing the need for dedicated hardware setups. Moreover, its approach to bidding allows for more dynamic and fine-tuned resource allocation, accommodating variations in workload requirements over time.

Theoretical Implications: Themis challenges the dominance of rigid scheduling systems, presenting a flexible method that accounts for the nuanced demands of ML workflows. Its model foresees scenarios where ML workloads exhibit diverse placement preferences, indicating a need for deeper investigation into how internal job characteristics influence allocation success within shared GPU clusters.

Speculations on AI Development: As AI technologies evolve, Themis’s bidding mechanism can be adapted to more complex environments where resource management extends beyond GPU clusters to hybrid cloud setups. The integration of advanced data analytics could further refine bidding strategies, allowing Themis to make allocation decisions based on predictive modeling of workload characteristics.

In conclusion, Themis represents a significant step forward in GPU cluster scheduling for ML workloads, offering a robust framework that balances efficiency with equitable resource distribution—a necessity as pressure on computational resources escalates amidst the rise of sophisticated AI models. Further research could focus on extending Themis’s scheduling algorithms to handle integrated data flow across heterogeneous computational resources, fortifying its place in next-generation AI infrastructure management.