Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 29 tok/s
GPT-5 High 29 tok/s Pro
GPT-4o 102 tok/s
GPT OSS 120B 462 tok/s Pro
Kimi K2 181 tok/s Pro
2000 character limit reached

MultiTASC: A Multi-Tenancy-Aware Scheduler for Cascaded DNN Inference at the Consumer Edge (2306.12830v1)

Published 22 Jun 2023 in cs.LG and cs.DC

Abstract: Cascade systems comprise a two-model sequence, with a lightweight model processing all samples and a heavier, higher-accuracy model conditionally refining harder samples to improve accuracy. By placing the light model on the device side and the heavy model on a server, model cascades constitute a widely used distributed inference approach. With the rapid expansion of intelligent indoor environments, such as smart homes, the new setting of Multi-Device Cascade is emerging where multiple and diverse devices are to simultaneously use a shared heavy model on the same server, typically located within or close to the consumer environment. This work presents MultiTASC, a multi-tenancy-aware scheduler that adaptively controls the forwarding decision functions of the devices in order to maximize the system throughput, while sustaining high accuracy and low latency. By explicitly considering device heterogeneity, our scheduler improves the latency service-level objective (SLO) satisfaction rate by 20-25 percentage points (pp) over state-of-the-art cascade methods in highly heterogeneous setups, while serving over 40 devices, showcasing its scalability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. S. I. Venieris, I. Panopoulos, and I. S. Venieris, “OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices,” in SMARTCOMP, 2021.
  2. S. Laskaridis, S. I. Venieris, A. Kouris, R. Li, and N. D. Lane, “The Future of Consumer Edge-AI Computing,” arXiv, 2022.
  3. M. Almeida, S. Laskaridis, A. Mehrotra, L. Dudziak, I. Leontiadis, and N. D. Lane, “Smart at What Cost? Characterising Mobile Deep Neural Networks in the Wild,” in IMC, 2021.
  4. E. Park, D. Kim, S. Kim, Y.-D. Kim, G. Kim, S. Yoon, and S. Yoo, “Big/little deep neural network for ultra low power inference,” in CODES+ISSS, 2015.
  5. M. Li, Y. Li, Y. Tian, L. Jiang, and Q. Xu, “AppealNet: An Efficient and Highly-Accurate Edge/Cloud Collaborative Architecture for DNN Inference,” in DAC, 2021.
  6. X. Wang, Y. Luo, D. Crankshaw, A. Tumanov, F. Yu, and J. E. Gonzalez, “IDK Cascades: Fast Deep Learning by Learning not to Overthink,” in UAI, 2018.
  7. S. I. Mirzadeh and H. Ghasemzadeh, “Optimal Policy for Deployment of Machine Learning Models on Energy-Bounded Systems,” in IJCAI, 2020.
  8. D. Stamoulis, T.-W. R. Chin, A. K. Prakash, H. Fang, S. Sajja, M. Bognar, and D. Marculescu, “Designing Adaptive Neural Networks for Energy-Constrained Image Classification,” in ICCAD, 2018.
  9. A. Kouris, S. I. Venieris, and C.-S. Bouganis, “CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks,” in FPL, 2018.
  10. T. Nakamura, S. Saito, K. Fujimoto, M. Kaneko, and A. Shiraga, “Spatial- and Time-Division Multiplexing in CNN Accelerator,” Parallel Computing, 2022.
  11. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” in CVPR, 2018.
  12. S. Han, H. Mao, and W. J. Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” in ICLR, 2015.
  13. Y. He, X. Zhang, and J. Sun, “Channel Pruning for Accelerating Very Deep Neural Networks,” in ICCV, 2017.
  14. G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” in NeurIPS, 2014.
  15. B. Cox, R. Birke, and L. Y. Chen, “Memory-aware and Context-aware Multi-DNN Inference on the Edge,” Pervasive and Mobile Computing, 2022.
  16. M. Almeida, S. Laskaridis, S. I. Venieris, I. Leontiadis, and N. D. Lane, “DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device,” TECS, 2022.
  17. J. Huang, C. Samplawski, D. Ganesan, B. Marlin, and H. Kwon, “CLIO: Enabling Automatic Compilation of Deep Learning Pipelines across IoT and Cloud,” in MobiCom, 2020.
  18. Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge,” in ASPLOS, 2017.
  19. C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On Calibration of Modern Neural Networks,” in ICML, 2017.
  20. A. J. Joshi, F. Porikli, and N. Papanikolopoulos, “Multi-Class Active Learning for Image Classification,” in CVPR, 2009.
  21. A. Ali, R. Pinciroli, F. Yan, and E. Smirni, “BATCH: Machine Learning Inference Serving on Serverless Platforms with Adaptive Batching,” in SC, 2020.
Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube