Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Predictive Autoscaler for Elastic Batch Jobs

Published 10 Oct 2020 in cs.LG | (2010.05049v1)

Abstract: Large batch jobs such as Deep Learning, HPC and Spark require far more computational resources and higher cost than conventional online service. Like the processing of other time series data, these jobs possess a variety of characteristics such as trend, burst, and seasonality. Cloud providers offer short-term instances to achieve scalability, stability, and cost-efficiency. Given the time lag caused by joining into the cluster and initialization, crowded workloads may lead to a violation in the scheduling system. Based on the assumption that there are infinite resources and ideal placements available for users to require in the cloud environment, we propose a predictive autoscaler to provide an elastic interface for the customers and overprovision instances based on the trained regression model. We contribute to a method to embed heterogeneous resource requirements in continuous space into discrete resource buckets and an autoscaler to do predictive expand plans on the time series of resource bucket counts. Our experimental evaluation of the production resources usage data validates the solution and the results show that the predictive autoscaler relieves the burden of making scaling plans, avoids long launching time at lower cost and outperforms other prediction methods with fine-tuned settings.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.