Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
GPT-5.1
GPT-5.1 130 tok/s
Gemini 3.0 Pro 29 tok/s Pro
Gemini 2.5 Flash 145 tok/s Pro
Kimi K2 191 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Variant Parallelism: Lightweight Deep Convolutional Models for Distributed Inference on IoT Devices (2210.08376v2)

Published 15 Oct 2022 in cs.DC and cs.CV

Abstract: Two major techniques are commonly used to meet real-time inference limitations when distributing models across resource-constrained IoT devices: (1) model parallelism (MP) and (2) class parallelism (CP). In MP, transmitting bulky intermediate data (orders of magnitude larger than input) between devices imposes huge communication overhead. Although CP solves this problem, it has limitations on the number of sub-models. In addition, both solutions are fault intolerant, an issue when deployed on edge devices. We propose variant parallelism (VP), an ensemble-based deep learning distribution method where different variants of a main model are generated and can be deployed on separate machines. We design a family of lighter models around the original model, and train them simultaneously to improve accuracy over single models. Our experimental results on six common mid-sized object recognition datasets demonstrate that our models can have 5.8-7.1x fewer parameters, 4.3-31x fewer multiply-accumulations (MACs), and 2.5-13.2x less response time on atomic inputs compared to MobileNetV2 while achieving comparable or higher accuracy. Our technique easily generates several variants of the base architecture. Each variant returns only 2k outputs 1 <= k <= (#classes/2), representing Top-k classes, instead of tons of floating point values required in MP. Since each variant provides a full-class prediction, our approach maintains higher availability compared with MP and CP in presence of failure.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A ConvNet for the 2020s,” in IEEE/CVF CVPR, 2022, pp. 11 976–11 986.
  2. R. Bhardwaj, Z. Xia, G. Ananthanarayanan, J. Jiang, N. Karianakis, Y. Shu, K. Hsieh, V. Bahl, and I. Stoica, “Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers,” in USENIX NSDI, 2022.
  3. M. Khani, G. Ananthanarayanan, K. Hsieh, J. Jiang, R. Netravali, Y. Shu, M. Alizadeh, and V. Bahl, “RECL: Responsive Resource-Efficient Continuous Learning for Video Analytics,” in USENIX NSDI, 2023.
  4. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” in IEEE CVPR, 2018, pp. 4510–4520.
  5. M. Tan and Q. Le, “EfficientNetV2: Smaller Models and Faster Training,” in ICML.   PMLR, 2021, pp. 10 096–10 106.
  6. G. Wang, Z. Liu, B. Hsieh, S. Zhuang, J. Gonzalez, T. Darrell, and I. Stoica, “sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data,” in MLSys, A. Smola, A. Dimakis, and I. Stoica, Eds., vol. 3, 2021, pp. 664–679.
  7. J. Emmons, S. Fouladi, G. Ananthanarayanan, S. Venkataraman, S. Savarese, and K. Winstein, “Cracking open the DNN black-box: Video Analytics with DNNs across the Camera-Cloud Boundary,” in HotEdgeVideo, 2019, pp. 27–32.
  8. X. Tan and I. Hakala, “StateOS: A Memory-Efficient Hybrid Operating System for IoT Devices,” IEEE Internet of Things Journal, 2023.
  9. L. Zeng, X. Chen, Z. Zhou, L. Yang, and J. Zhang, “CoEdge: Cooperative DNN Inference with Adaptive Workload Partitioning over Heterogeneous Edge Devices,” IEEE/ACM Transactions on Networking, vol. 29, no. 2, pp. 595–608, 2020.
  10. P. Yu and M. Chowdhury, “Fine-Grained GPU Sharing Primitives for Deep Learning Applications,” in MLSys, 2020.
  11. J. Du, M. Shen, and Y. Du, “A Distributed In-Situ CNN Inference System for IoT Applications,” in IEEE ICCD, 2020, pp. 279–287.
  12. H. Zhou, M. Li, N. Wang, G. Min, and J. Wu, “Accelerating Deep Learning Inference via Model Parallelism and Partial Computation Offloading,” IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 2, pp. 475–488, 2022.
  13. Y. Yang, J. Chung, G. Wang, V. Gupta, A. Karnati, K. Jiang, I. Stoica, J. Gonzalez, and K. Ramchandran, “Robust Class Parallelism-Error Resilient Parallel Inference with Low Communication Cost,” in IEEE ASILOMAR, 2020, pp. 1064–1065.
  14. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv:1704.04861, 2017.
  15. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: a Large-Scale Hierarchical Image Database,” in IEEE CVPR, 2009, pp. 248–255.
  16. A. Krizhevsky and G. Hinton, “Learning Multiple Layers of Features from Tiny Images,” 2009.
  17. H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms,” arXiv:1708.07747, 2017.
  18. Y. LeCun, C. Cortes, and C. J. Burges, “The MNIST Database of Handwritten Digits,” ATT Labs. Available: yann.lecun.com/exdb/mnist, vol. 2, 2010.
  19. Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading Digits in Natural Images with Unsupervised Feature Learning,” 2011.
  20. L. Bossard, M. Guillaumin, and L. Van Gool, “Food-101 - Mining Discriminative Components with Random Forests,” in ECCV.   Springer, 2014, pp. 446–461.
  21. “TFLite Model Benchmark Tool with C++ Binary,” 2022. [Online]. Available: github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark
  22. “TensorFlow Lite,” 2022. [Online]. Available: tensorflow.org/lite
  23. Z. Allen-Zhu and Y. Li, “Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning,” arXiv:2012.09816, 2020.
  24. Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge,” ACM SIGARCH Computer Architecture News, vol. 45, no. 1, pp. 615–629, 2017.
  25. M. Sbai, M. R. U. Saputra, N. Trigoni, and A. Markham, “Cut, Distil and Encode (CDE): Split Cloud-Edge Deep Inference,” in IEEE SECON, 2021, pp. 1–9.
  26. N. Asadi and M. Goudarzi, “An Ensemble Mobile-Cloud Computing Method for Affordable and Accurate Glucometer Readout,” arXiv:2301.01758, 2023.
  27. H. Zhou, Z. Zhang, D. Li, and Z. Su, “Joint Optimization of Computing Offloading and Service Caching in Edge Computing-Based Smart Grid,” IEEE Transactions on Cloud Computing, 2022.
  28. H. Zhou, T. Wu, X. Chen, S. He, D. Guo, and J. Wu, “Reverse Auction-Based Computation Offloading and Resource Allocation in Mobile Cloud-Edge Computing,” IEEE Transactions on Mobile Computing, 2022.
  29. K. Zhao, A. Jain, and M. Zhao, “Automatic Attention Pruning: Improving and Automating Model Pruning using Attentions,” in AISTATS, 2023, pp. 10 470–10 486.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.