Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Online Algorithms for Hierarchical Inference in Deep Learning applications at the Edge (2304.00891v2)

Published 3 Apr 2023 in cs.LG and cs.CV

Abstract: We consider a resource-constrained Edge Device (ED), such as an IoT sensor or a microcontroller unit, embedded with a small-size ML model (S-ML) for a generic classification application and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, but it defeats the purpose of embedding S-ML on the ED and deprives the benefits of reduced latency, bandwidth savings, and energy efficiency of doing local inference. In order to get the best out of both worlds, i.e., the benefits of doing inference on the ED and the benefits of doing inference on ES, we explore the idea of Hierarchical Inference (HI), wherein S-ML inference is only accepted when it is correct, otherwise the data sample is offloaded for L-ML inference. However, the ideal implementation of HI is infeasible as the correctness of the S-ML inference is not known to the ED. We propose an online meta-learning framework that the ED can use to predict the correctness of the S-ML inference. In particular, we propose to use the maximum softmax value output by S-ML for a data sample and decide whether to offload it or not. The resulting online learning problem turns out to be a Prediction with Expert Advice (PEA) problem with continuous expert space. We propose two different algorithms and prove sublinear regret bounds for them without any assumption on the smoothness of the loss function. We evaluate and benchmark the performance of the proposed algorithms for image classification application using four datasets, namely, Imagenette and Imagewoof, MNIST, and CIFAR-10.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. J. Howard and S. Gugger, “Fastai: A layered api for deep learning,” Information, vol. 11, no. 2, 2020. [Online]. Available: https://github.com/fastai/imagenette
  2. Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010. [Online]. Available: http://yann.lecun.com/exdb/mnist/
  3. A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  4. X. Wang, Y. Han, V. C. M. Leung, D. Niyato, X. Yan, and X. Chen, “Convergence of edge computing and deep learning: A comprehensive survey,” IEEE Communications Surveys and Tutorials, vol. 22, no. 2, pp. 869–904, 2020.
  5. R. Sanchez-Iborra and A. F. Skarmeta, “Tinyml-enabled frugal smart objects: Challenges and opportunities,” IEEE Circuits and Systems Magazine, vol. 20, no. 3, pp. 4–18, 2020.
  6. L. Deng, G. Li, S. Han, L. Shi, and Y. Xie, “Model compression and hardware acceleration for neural networks: A comprehensive survey,” Proceedings of the IEEE, vol. 108, no. 4, pp. 485–532, 2020.
  7. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” CoRR, vol. abs/1704.04861, 2017.
  8. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proc. IEEE CVPR, 2009, pp. 248–255.
  9. M. Wortsman, G. Ilharco, S. Y. Gadre, R. Roelofs, R. Gontijo-Lopes, A. S. Morcos, H. Namkoong, A. Farhadi, Y. Carmon, S. Kornblith et al., “Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time,” in Proc. ICML, 2022, pp. 23 965–23 998.
  10. Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” in Proc. ACM ASPLOS, 2017, p. 615–629.
  11. E. Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-demand accelerating deep neural network inference via edge computing,” IEEE Transactions on Wireless Communications, vol. 19, no. 1, pp. 447–457, 2020.
  12. C. Hu and B. Li, “Distributed inference with deep learning models across heterogeneous edge devices,” in Proc. IEEE INFOCOM, 2022, pp. 330–339.
  13. S. Teerapittayanon, B. McDanel, and H. Kung, “Branchynet: Fast inference via early exiting from deep neural networks,” in Proc. ICPR, 2016, pp. 2464–2469.
  14. A. Krizhevsky, “The CIFAR-10 dataset,” 2009. [Online]. Available: https://www.cs.toronto.edu/ kriz/cifar.html
  15. G. Al-Atat, A. Fresa, A. P. Behera, V. N. Moothedath, J. Gross, and J. P. Champati, “The case for hierarchical deep learning inference at the network edge,” in Proc. Workshop on Networked AI Systems, ACM MobiSys, 2023, pp. 1–6.
  16. A. P. Behera, R. Morabito, J. Widmer, and J. P. Champati, “Improved decision module selection for hierarchical inference in resource-constrained edge devices,” in Proc. IEEE MobiCom (short paper), 2023, pp. 1–3.
  17. M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies, “The case for vm-based cloudlets in mobile computing,” IEEE Pervasive Computing, vol. 8, no. 4, pp. 14–23, 2009.
  18. E. Cuervo, A. Balasubramanian, D. ki Cho, A. Wolman, S. Saroiu, R. Chandra, and P. Bahl, “MAUI: making smartphones last longer with code offload.” in MobiSys.   ACM, 2010, pp. 49–62.
  19. S. Guo, J. Liu, Y. Yang, B. Xiao, and Z. Li, “Energy-efficient dynamic computation offloading and cooperative task scheduling in mobile cloud computing,” IEEE Transactions on Mobile Computing, vol. 18, no. 2, pp. 319–333, 2019.
  20. S. Sundar, J. P. V. Champati, and B. Liang, “Multi-user task offloading to heterogeneous processors with communication delay and budget constraints,” IEEE Transactions on Cloud Computing, pp. 1–1, 2020.
  21. S. S. Ogden and T. Guo, “Mdinference: Balancing inference accuracy and latency for mobile applications,” in Proc. IEEE IC2E, 2020, pp. 28–39.
  22. A. Fresa and J. P. Champati, “Offloading algorithms for maximizing inference accuracy on edge device under a time constraint,” CoRR, vol. abs/2112.11413, 2021. [Online]. Available: https://arxiv.org/abs/2112.11413
  23. I. Nikoloska and N. Zlatanov, “Data selection scheme for energy efficient supervised learning at IoT nodes,” IEEE Communications Letters, vol. 25, no. 3, pp. 859–863, 2021.
  24. J. Wang, Z. Feng, Z. Chen, S. George, M. Bala, P. Pillai, S.-W. Yang, and M. Satyanarayanan, “Bandwidth-efficient live video analytics for drones via edge computing,” in Proc. IEEE/ACM SEC, 2018, pp. 159–173.
  25. J. Wang, Z. Feng, Z. Chen, S. A. George, M. Bala, P. Pillai, S.-W. Yang, and M. Satyanarayanan, “Edge-based live video analytics for drones,” IEEE Internet Computing, vol. 23, no. 4, pp. 27–34, 2019.
  26. S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural networks,” in Proc. NeurIPS, 2015, p. 1135–1143.
  27. M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: Imagenet classification using binary convolutional neural networks,” in Proc. ECCV, 2016, pp. 525–542.
  28. E. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus, “Exploiting linear structure within convolutional networks for efficient evaluation,” in Proc. NeurIPS, 2014, p. 1269–1277.
  29. Z. Wang, W. Bao, D. Yuan, L. Ge, N. H. Tran, and A. Y. Zomaya, “SEE: Scheduling early exit for mobile dnn inference during service outage,” in Proc. MSWIM, 2019, p. 279–288.
  30. B. Taylor, V. S. Marco, W. Wolff, Y. Elkhatib, and Z. Wang, “Adaptive deep learning model selection on embedded systems,” ACM SIGPLAN Not., vol. 53, no. 6, p. 31–43, 2018.
  31. C. Hu, W. Bao, D. Wang, and F. Liu, “Dynamic adaptive dnn surgery for inference acceleration on the edge,” in Proc. INFOCOM, 2019, pp. 1423–1431.
  32. S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Foundations and Trends® in Machine Learning, vol. 5, no. 1, pp. 1–122, 2012.
  33. X.-Y. Zhang, G.-S. Xie, X. Li, T. Mei, and C.-L. Liu, “A survey on learning to reject,” Proceedings of the IEEE, vol. 111, no. 2, pp. 185–215, 2023.
  34. N. Charoenphakdee, Z. Cui, Y. Zhang, and M. Sugiyama, “Classification with rejection based on cost-sensitive classification,” in Proc. ICML, vol. 139, 2021, pp. 1507–1517.
  35. D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” arXiv preprint arXiv:1610.02136, 2016.
  36. C. De Stefano, C. Sansone, and M. Vento, “To reject or not to reject: that is the question-an answer in case of neural classifiers,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 30, no. 1, pp. 84–94, 2000.
  37. L. P. Cordella, C. De Stefano, F. Tortorella, and M. Vento, “A method for improving classification reliability of multilayer perceptrons,” IEEE Transactions on Neural Networks, vol. 6, no. 5, pp. 1140–1147, 1995.
  38. P. Auer, R. Ortner, and C. Szepesvári, “Improved rates for the stochastic continuum-armed bandit problem,” in Learning Theory.   Springer Berlin Heidelberg, 2007, pp. 454–468.
  39. S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, “X-armed bandits,” J. Mach. Learn. Res., vol. 12, no. null, p. 1655–1695, jul 2011.
  40. S. Singh, “Continuum-armed bandits: A function space perspective,” in Proc. AISTATS, vol. 130, 13–15 Apr 2021, pp. 2620–2628.
  41. S. Bubeck, “Introduction to online optimization,” Lecture notes, vol. 2, pp. 1–86, 2011.
  42. N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth, “How to use expert advice,” J. ACM, vol. 44, no. 3, p. 427–485, may 1997.
  43. N. Cesa-Bianchi, G. Lugosi, and G. Stoltz, “Minimizing regret with label efficient prediction,” IEEE Transactions on Information Theory, vol. 51, no. 6, pp. 2152–2162, 2005.
  44. GitHub Repository, “Retrieving the predictions in the CIFAR-10 dataset,” apr 2021. [Online]. Available: https://github.com/TracyRenee61/CIFAR_10
Citations (7)

Summary

We haven't generated a summary for this paper yet.