Analytically-Driven Resource Management for Cloud-Native Microservices (2401.02920v1)
Abstract: Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that ML-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA maintenance and resource efficiency. However, ML-driven approaches also face challenges including lengthy data collection processes and limited scalability. We present Ursa, a lightweight resource management system for cloud-native microservices that addresses these challenges. Ursa uses an analytical model that decomposes the end-to-end SLA into per-service SLA, and maps per-service SLA to individual resource allocations per microservice tier. To speed up the exploration process and avoid prolonged SLA violations, Ursa explores each microservice individually, and swiftly stops exploration if latency exceeds its SLA. We evaluate Ursa on a set of representative and end-to-end microservice topologies, including a social network, media service and video processing pipeline, each consisting of multiple classes and priorities of requests with different SLAs, and compare it against two representative ML-driven systems, Sinan and Firm. Compared to these ML-driven approaches, Ursa provides significant advantages: It shortens the data collection process by more than 128x, and its control plane is 43x faster than ML-driven approaches. At the same time, Ursa does not sacrifice resource efficiency or SLAs. During online deployment, Ursa reduces the SLA violation rate by 9.0% up to 49.9%, and reduces CPU allocation by up to 86.2% compared to ML-driven approaches.
- “Alibaba Cloud Container Service for Kubernetes,” https://www.alibabacloud.com/product/kubernetes.
- “Amazon Elastic Kubernetes Service,” https://aws.amazon.com/eks/.
- “Apache Kafka,” https://kafka.apache.org/.
- “Azure Kubernetes Service,” https://azure.microsoft.com/en-us/products/kubernetes-service/#overview.
- “Dapr: APIs for building portable and reliable microservices,” https://dapr.io/.
- “Decomposing twitter: Adventures in service-oriented architecture,” https://www.slideshare.net/InfoQ/decomposing-twitter-adventures-in-serviceoriented-architecture.
- “FFmpeg: A complete, cross-platform solution to record, convert and stream audio and video,” https://ffmpeg.org/.
- “Google Kubernetes Engine,” https://cloud.google.com/kubernetes-engine.
- “gRPC: A high performance, open source universal RPC framework,” https://grpc.io/.
- “Gurobi Optimization,” https://www.gurobi.com/.
- “Hugging Face,” https://huggingface.co/.
- “Kubernetes: Production-Grade Container Orchestration,” https://kubernetes.io/.
- “Locust: A modern load testing framework,” https://locust.io/.
- “OpenCV Face Recognition,” https://opencv.org/.
- “Prometheus,” https://prometheus.io/.
- “Redis,” https://redis.io/.
- “Redis streams tutorial,” https://redis.io/docs/data-types/streams-tutorial/.
- “Step and simple scaling policies for amazon ec2 auto scaling,” https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-simple-step.html.
- P. Ambati, Í. Goiri, F. Frujeri, A. Gun, K. Wang, B. Dolan, B. Corell, S. Pasupuleti, T. Moscibroda, S. Elnikety, and R. Bianchini, “Providing SLOs for Resource-Harvesting VMs in Cloud Platforms,” in OSDI, 2020.
- E. Boutin, J. Ekanayake, W. Lin, B. Shi, J. Zhou, Z. Qian, M. Wu, and L. Zhou, “Apollo: Scalable and coordinated scheduling for cloud-scale computing,” in OSDI, 2014.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213–229.
- K.-H. Chow, U. Deshpande, S. Seshadri, and L. Liu, “Deeprest: deep resource estimation for interactive microservices,” in Proceedings of the Seventeenth European Conference on Computer Systems, 2022, pp. 181–198.
- E. Cortez, A. Bonde, A. Muzio, M. Russinovich, M. Fontoura, and R. Bianchini, “Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms,” in Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 2017, pp. 153–167.
- P. Delgado, F. Dinu, A.-M. Kermarrec, and W. Zwaenepoel, “Hawk: Hybrid datacenter scheduling,” in USENIX ATC, 2015.
- C. Delimitrou and C. Kozyrakis, “Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters,” in Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Houston, TX, USA, 2013.
- C. Delimitrou and C. Kozyrakis, “QoS-Aware Scheduling in Heterogeneous Datacenters with Paragon,” in ACM Transactions on Computer Systems (TOCS), Vol. 31 Issue 4. December 2013.
- C. Delimitrou and C. Kozyrakis, “Quality-of-Service-Aware Scheduling in Heterogeneous Datacenters with Paragon,” in IEEE Micro Special Issue on Top Picks from the Computer Architecture Conferences. May/June 2014.
- C. Delimitrou and C. Kozyrakis, “Quasar: Resource-Efficient and QoS-Aware Cluster Management,” in Proceedings of the Nineteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Salt Lake City, UT, USA, 2014.
- C. Delimitrou and C. Kozyrakis, “HCloud: Resource-Efficient Provisioning in Shared Cloud Systems,” in Proceedings of the Twenty First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2016.
- C. Delimitrou and C. Kozyrakis, “Bolt: I Know What You Did Last Summer… In The Cloud,” in Proc. of the Twenty Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017.
- C. Delimitrou, D. Sanchez, and C. Kozyrakis, “Tarcil: Reconciling Scheduling Speed and Quality in Large Shared Clusters,” in Proceedings of the Sixth ACM Symposium on Cloud Computing (SOCC), August 2015.
- A. D. Ferguson, P. Bodik, S. Kandula, E. Boutin, and R. Fonseca, “Jockey: Guaranteed Job Latency in Data Parallel Clusters,” in EuroSys, 2012.
- Y. Gan and C. Delimitrou, “The Architectural Implications of Cloud Microservices,” in Computer Architecture Letters (CAL), vol.17, iss. 2, Jul-Dec 2018.
- Y. Gan, M. Liang, S. Dev, D. Lo, and C. Delimitrou, “Sage: practical and scalable ml-driven performance debugging in microservices,” in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021, pp. 135–151.
- Y. Gan, M. Pancholi, D. Cheng, S. Hu, Y. He, and C. Delimitrou, “Seer: Leveraging Big Data to Navigate the Complexity of Cloud Debugging,” in Proceedings of the Tenth USENIX Workshop on Hot Topics in Cloud Computing (HotCloud), July 2018.
- Y. Gan, Y. Zhang, D. Cheng, A. Shetty, P. Rathi, N. Katarki, A. Bruno, J. Hu, B. Ritchken, B. Jackson, K. Hu, M. Pancholi, B. Clancy, C. Colen, F. Wen, C. Leung, S. Wang, L. Zaruvinsky, M. Espinosa, Y. He, and C. Delimitrou, “An open-source benchmark suite for microservices and their hardware-software implications for cloud & edge systems,” in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 2019, pp. 3–18.
- A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, “Dominant Resource Fairness: Fair Allocation of Multiple Resource Types,” in NSDI, 2011.
- K. Gligorić, A. Anderson, and R. West, “How constraints affect content: The case of twitter’s switch from 140 to 280 characters,” in Twelfth International AAAI Conference on Web and Social Media, 2018.
- I. Gog, M. Schwarzkopf, A. Gleave, R. N. Watson, and S. Hand, “Firmament: Fast, centralized cluster scheduling at scale,” in OSDI, 2016.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
- M. Isard, V. Prabhakaran, J. Currey, U. Wieder, K. Talwar, and A. Goldberg, “Quincy: fair scheduling for distributed computing clusters,” in SOSP, 2009.
- S. A. Jyothi, C. Curino, I. Menache, S. M. Narayanamurthy, A. Tumanov, J. Yaniv, R. Mavlyutov, I. Goiri, S. Krishnan, J. Kulkarni et al., “Morpheus: Towards automated slos for enterprise clusters,” in SOSP, 2016.
- K. Karanasos, S. Rao, C. Curino, C. Douglas, K. Chaliparambil, G. M. Fumarola, S. Heddaya, R. Ramakrishnan, and S. Sakalanaga, “Mercury: Hybrid centralized and distributed scheduling in large shared clusters,” in USENIX ATC, 2015.
- Kubernetes, “Kubernetes cpu management policy,” https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/, 2021.
- H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a news media?” in Proceedings of the 19th international conference on World wide web. AcM, 2010, pp. 591–600.
- A. H. Land and A. G. Doig, “An automatic method for solving discrete programming problems,” in 50 Years of Integer Programming 1958-2008. Springer, 2010, pp. 105–132.
- S. Luo, H. Xu, C. Lu, K. Ye, G. Xu, L. Zhang, Y. Ding, J. He, and C. Xu, “Characterizing microservice dependency and performance: Alibaba trace analysis,” in Proceedings of the ACM Symposium on Cloud Computing, 2021, pp. 412–426.
- S. Luo, H. Xu, K. Ye, G. Xu, L. Zhang, G. Yang, and C. Xu, “The power of prediction: microservice auto scaling via workload learning,” in Proceedings of the 13th Symposium on Cloud Computing, 2022, pp. 355–369.
- A. Mahgoub, E. B. Yi, K. Shankar, S. Elnikety, S. Chaterji, and S. Bagchi, “{{\{{ORION}}\}} and the three rights: Sizing, bundling, and prewarming for serverless {{\{{DAGs}}\}},” in 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), 2022, pp. 303–320.
- D. Narayanan, F. Kazhamiaka, F. Abuzaid, P. Kraft, A. Agrawal, S. Kandula, S. Boyd, and M. Zaharia, “Solving large-scale granular resource allocation problems efficiently with pop,” in Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021, pp. 521–537.
- K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica, “Sparrow: Distributed, Low Latency Scheduling,” in SOSP, 2013.
- J. W. Park, A. Tumanov, A. Jiang, M. A. Kozuch, and G. R. Ganger, “3sigma: distribution-based cluster scheduling for runtime uncertainty,” in EuroSys, 2018.
- H. Qiu, S. S. Banerjee, S. Jha, Z. T. Kalbarczyk, and R. K. Iyer, “Firm: An intelligent fine-grained resource management framework for slo-oriented microservices,” arXiv preprint arXiv:2008.08509, 2020.
- F. Romero, G. I. Chaudhry, Í. Goiri, P. Gopa, P. Batum, N. J. Yadwadkar, R. Fonseca, C. Kozyrakis, and R. Bianchini, “Faa$T: A Transparent Auto-Scaling Cache for Serverless Applications,” arXiv preprint arXiv:2104.13869, 2021.
- K. Rzadca, P. Findeisen, J. Swiderski, P. Zych, P. Broniek, J. Kusmierek, P. Nowak, B. Strack, P. Witusowski, S. Hand, and J. Wilkes, “Autopilot: workload autoscaling at google,” in Proceedings of the Fifteenth European Conference on Computer Systems, 2020, pp. 1–16.
- M. Shahrad, R. Fonseca, Í. Goiri, G. Chaudhry, P. Batum, J. Cooke, E. Laureano, C. Tresness, M. Russinovich, and R. Bianchini, “Serverless in the wild: Characterizing and optimizing the serverless workload at a large cloud provider,” in 2020 USENIX Annual Technical Conference (USENIX ATC 20), 2020, pp. 205–218.
- A. Sriraman and T. F. Wenisch, “μ𝜇\muitalic_μ suite: a benchmark suite for microservices,” in 2018 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2018, pp. 1–12.
- A. Sriraman and T. F. Wenisch, “µtune: Auto-tuned threading for OLDI microservices,” in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). Carlsbad, CA: USENIX Association, Oct. 2018, pp. 177–194.
- L. Suresh, P. Bodik, I. Menache, M. Canini, and F. Ciucu, “Distributed resource management across process boundaries,” in Proceedings of the 2017 Symposium on Cloud Computing. ACM, 2017, pp. 611–623.
- Tony Mauro, “Adopting microservices at Netflix: Lessons for architectural design,” https://www.nginx.com/blog/microservicesat-netflix-architectural-best-practices/.
- A. Tumanov, T. Zhu, J. W. Park, M. A. Kozuch, M. Harchol-Balter, and G. R. Ganger, “TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters,” in EuroSys, 2016.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- Z. Wang, S. Zhu, J. Li, W. Jiang, K. Ramakrishnan, Y. Zheng, M. Yan, X. Zhang, and A. X. Liu, “Deepscaling: microservices autoscaling for stable cpu utilization in large scale cloud systems,” in Proceedings of the 13th Symposium on Cloud Computing, 2022, pp. 16–30.
- B. L. Welch, “The generalization of ‘student’s’problem when several different population varlances are involved,” Biometrika, vol. 34, no. 1-2, pp. 28–35, 1947.
- M. Welsh, D. Culler, and E. Brewer, “Seda: An architecture for well-conditioned, scalable internet services,” ACM SIGOPS operating systems review, vol. 35, no. 5, pp. 230–243, 2001.
- H. Yang, Q. Chen, M. Riaz, Z. Luan, L. Tang, and J. Mars, “Powerchief: Intelligent power allocation for multi-stage applications to improve responsiveness on power constrained cmp,” in Proceedings of the 44th Annual International Symposium on Computer Architecture, ser. ISCA ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 133–146.
- Y. Zhang, W. Hua, Z. Zhou, G. E. Suh, and C. Delimitrou, “Sinan: Ml-based and qos-aware resource management for cloud microservices,” in Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021, pp. 167–181.
- H. Zhou, M. Chen, Q. Lin, Y. Wang, X. She, S. Liu, R. Gu, B. C. Ooi, and J. Yang, “Overload control for scaling wechat microservices,” in Proceedings of the ACM Symposium on Cloud Computing. ACM, 2018, pp. 149–161.
- X. Zhou, X. Peng, T. Xie, J. Sun, C. Ji, W. Li, and D. Ding, “Fault analysis and debugging of microservice systems: Industrial survey, benchmark system, and empirical study,” IEEE Transactions on Software Engineering, vol. 47, no. 2, pp. 243–260, 2018.