Papers
Topics
Authors
Recent
2000 character limit reached

Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows (2402.17652v2)

Published 27 Feb 2024 in cs.DC and cs.AI

Abstract: We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing. In such systems, coscheduling of GPU memory management and task placement represents a promising opportunity. We propose Compass, a novel framework that unifies these functions to reduce job latency while using resources efficiently, placing tasks where data dependencies will be satisfied, collocating tasks from the same job (when this will not overload the host or its GPU), and efficiently managing GPU memory. Comparison with other state of the art schedulers shows a significant reduction in completion times while requiring the same amount or even fewer resources. In one case, just half the servers were needed for processing the same workload.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Intel Xeon Gold 6242R Processor. https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable/gold.html.
  2. Intel Xeon Processor E5-2403. https://www.intel.com/content/www/us/en/products/details/processors/xeon/e.html.
  3. A Remote Direct Memory Access Protocol Specification. https://tools.ietf.org/html/rfc5040.
  4. Alibaba. Alibaba Production Cluster Trace Data. https://github.com/alibaba/clusterdata.
  5. NVIDIA Triton Inference Server Organization. Triton Inference Server. https://github.com/triton-inference-server.
  6. TensorFlow Serving. https://github.com/tensorflow/serving.
  7. Helsinki-NLP/opus-mt-en-fr. https://huggingface.co/Helsinki-NLP/opus-mt-en-fr. Hugging Face.
  8. K024/mt5-zh-ja-en-trimmed. https://huggingface.co/K024/mt5-zh-ja-en-trimmed. Hugging Face.
  9. https://mxnet.apache.org/versions/
  10. Flink architecture: Tasks and operator chains. https: //bit.ly/3rTFplD. 2021.
  11. vit-gpt2-image-captioning. https://huggingface.co/nlpconnect/vit-gpt2-image-captioning. Hugging Face.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.