Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Fair and Firm Real-Time Scheduling in DNN Multi-Tenant Multi-Accelerator Systems via Reinforcement Learning (2403.00766v1)

Published 9 Feb 2024 in cs.AR, cs.DC, and cs.LG

Abstract: This paper addresses the critical challenge of managing Quality of Service (QoS) in cloud services, focusing on the nuances of individual tenant expectations and varying Service Level Indicators (SLIs). It introduces a novel approach utilizing Deep Reinforcement Learning for tenant-specific QoS management in multi-tenant, multi-accelerator cloud environments. The chosen SLI, deadline hit rate, allows clients to tailor QoS for each service request. A novel online scheduling algorithm for Deep Neural Networks in multi-accelerator systems is proposed, with a focus on guaranteeing tenant-wise, model-specific QoS levels while considering real-time constraints.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. M. Hamdaoui and P. Ramanathan, “A dynamic priority assignment technique for streams with (m, k)-firm deadlines,” IEEE transactions on Computers, vol. 44, no. 12, pp. 1443–1451, 1995.
  2. E. Russo, M. Palesi, S. Monteleone, D. Patti, G. Ascia, and V. Catania, “Medea: A multi-objective evolutionary approach to dnn hardware mapping,” in 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2022, pp. 226–231.
  3. S.-C. Kao and T. Krishna, “Magma: An optimization framework for mapping multiple dnns on multiple accelerator cores,” in 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).   IEEE, 2022, pp. 814–830.
  4. E. Russo, M. Palesi, D. Patti, S. Monteleone, G. Ascia, and V. Catania, “Multiobjective end-to-end design space exploration of parameterized dnn accelerators,” IEEE Internet of Things Journal, vol. 10, no. 2, pp. 1800–1812, 2023.
  5. Y. Choi and M. Rhu, “Prema: A predictive multi-task scheduling algorithm for preemptible neural processing units,” in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).   IEEE, 2020, pp. 220–233.
  6. H. Kwon, L. Lai, M. Pellauer, T. Krishna, Y.-H. Chen, and V. Chandra, “Heterogeneous dataflow accelerators for multi-dnn workloads,” in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).   IEEE, 2021, pp. 71–83.
  7. S. Ghodrati, B. H. Ahn, J. K. Kim, S. Kinzer, B. R. Yatham, N. Alla, H. Sharma, M. Alian, E. Ebrahimi, N. S. Kim et al., “Planaria: Dynamic architecture fission for spatial multi-tenant acceleration of deep neural networks,” in 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).   IEEE, 2020, pp. 681–697.
  8. S. Kim, H. Genc, V. V. Nikiforov, K. Asanović, B. Nikolić, and Y. S. Shao, “Moca: Memory-centric, adaptive execution for multi-tenant deep neural networks,” in 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA).   IEEE, 2023, pp. 828–841.
  9. J. Soifer, J. Li, M. Li, J. Zhu, Y. Li, Y. He, E. Zheng, A. Oltean, M. Mosyak, C. Barnes et al., “Deep learning inference service at microsoft,” in 2019 USENIX Conference on Operational Machine Learning (OpML 19), 2019, pp. 15–17.
  10. L. Abeni and G. Buttazzo, “Qos guarantee using probabilistic deadlines,” in Proceedings of 11th Euromicro Conference on Real-Time Systems. Euromicro RTS’99.   IEEE, 1999, pp. 242–249.
  11. Y. S. Shao, J. Clemons, R. Venkatesan, B. Zimmer, M. Fojtik, N. Jiang, B. Keller, A. Klinefelter, N. Pinckney, P. Raina et al., “Simba: Scaling deep-learning inference with multi-chip-module-based architecture,” in Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 14–27.
  12. Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze, “Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292–308, 2019.
  13. G. Da Costa, L. Grange, and I. De Courchelle, “Modeling and generating large-scale google-like workload,” in 2016 Seventh International Green and Sustainable Computing Conference (IGSC).   IEEE, 2016, pp. 1–7.
  14. A. Parashar, P. Raina, Y. S. Shao, Y.-H. Chen, V. A. Ying, A. Mukkara, R. Venkatesan, B. Khailany, S. W. Keckler, and J. Emer, “Timeloop: A systematic approach to dnn accelerator evaluation,” in 2019 IEEE international symposium on performance analysis of systems and software (ISPASS).   IEEE, 2019, pp. 304–315.
  15. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  16. S. Kim, H. Genc, V. V. Nikiforov, K. Asanović, B. Nikolić, and Y. S. Shao, “Moca: Memory-centric, adaptive execution for multi-tenant deep neural networks,” in 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 828–841.
  17. G. Li, J. Wu, J. Li, Z. Zhou, and L. Guo, “Sla-aware fine-grained qos provisioning for multi-tenant software-defined networks,” IEEE access, vol. 6, pp. 159–170, 2017.
Citations (1)

Summary

We haven't generated a summary for this paper yet.