Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Converting Autoencoder Toward Low-latency and Energy-efficient DNN Inference at the Edge (2403.07036v1)

Published 11 Mar 2024 in cs.LG, cs.CV, and cs.DC

Abstract: Reducing inference time and energy usage while maintaining prediction accuracy has become a significant concern for deep neural networks (DNN) inference on resource-constrained edge devices. To address this problem, we propose a novel approach based on "converting" autoencoder and lightweight DNNs. This improves upon recent work such as early-exiting framework and DNN partitioning. Early-exiting frameworks spend different amounts of computation power for different input data depending upon their complexity. However, they can be inefficient in real-world scenarios that deal with many hard image samples. On the other hand, DNN partitioning algorithms that utilize the computation power of both the cloud and edge devices can be affected by network delays and intermittent connections between the cloud and the edge. We present CBNet, a low-latency and energy-efficient DNN inference framework tailored for edge devices. It utilizes a "converting" autoencoder to efficiently transform hard images into easy ones, which are subsequently processed by a lightweight DNN for inference. To the best of our knowledge, such autoencoder has not been proposed earlier. Our experimental results using three popular image-classification datasets on a Raspberry Pi 4, a Google Cloud instance, and an instance with Nvidia Tesla K80 GPU show that CBNet achieves up to 4.8x speedup in inference latency and 79% reduction in energy usage compared to competing techniques while maintaining similar or higher accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Net-trim: Convex pruning of deep neural networks with performance guarantee. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 3180–3189, Red Hook, NY, USA, 2017. Curran Associates Inc.
  2. J. Chen and X. Ran. Deep learning with edge computing: A review. Proceedings of the IEEE, PP:1–20, 07 2019.
  3. Deep learning for classical japanese literature. ArXiv, abs/1812.01718, 2018.
  4. L. Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  5. X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. Journal of Machine Learning Research - Proceedings Track, 9:249–256, 01 2010.
  6. Intelligence beyond the edge: Inference on intermittent embedded systems. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’19, page 199–213, New York, NY, USA, 2019. Association for Computing Machinery.
  7. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
  8. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. arXiv: Computer Vision and Pattern Recognition, 2016.
  9. Deep residual learning for image recognition. In 2016 IEEE Conf on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
  10. Transforming auto-encoders. ICANN’11, page 44–51, Berlin, Heidelberg, 2011. Springer-Verlag.
  11. Power signature analysis of the specpower_ssj2008 benchmark. In (IEEE ISPASS) IEEE International Symposium on Performance Analysis of Systems and Software, pages 227–236, 2011.
  12. C. Isci and M. Martonosi. Runtime power monitoring in high-end processors: methodology and empirical data. In Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36., pages 93–104, 2003.
  13. Trading-off accuracy and energy of deep inference on embedded systems: A co-design approach. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(11):2881–2893, Nov 2018.
  14. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017.
  15. Powerpi: Measuring and modeling the power consumption of the raspberry pi. In 39th Annual IEEE Conference on Local Computer Networks, pages 236–243, 2014.
  16. K. Keahey et al. Lessons learned from the chameleon testbed. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC ’20). USENIX Association, July 2020.
  17. D. Kingma and J. Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 12 2014.
  18. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
  19. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12, page 1097–1105, Red Hook, NY, USA, 2012. Curran Associates Inc.
  20. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  21. S. Lee and S. Nirjon. Subflow: A dynamic induced-subgraph strategy toward real-time dnn inference and training. In IEEE Real-Time and Embedded Technology and Applications Symposium(RTAS), 2020.
  22. Edge ai: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications, PP:1–1, 10 2019.
  23. Edge intelligence: On-demand deep learning model co-inference with device-edge synergy. Proceedings of the 2018 Workshop on Mobile Edge Communications, 2018.
  24. Learning iot in edge: Deep learning for the internet of things with edge computing. IEEE Network, 32(1):96–101, 2018.
  25. Experimental and quantitative analysis of server power model for cloud data centers. Future Generation Computer Systems, 86:940–950, 2018.
  26. Adadeep: A usage-driven, automated deep model compression framework for enabling ubiquitous intelligent mobiles. CoRR, abs/2006.04432, 2020.
  27. Conditional deep learning for energy-efficient and enhanced pattern recognition. In Proceedings of the 2016 Conference on Design, Automation and Test in Europe, 2016.
  28. Bpnet: Branch-pruned conditional neural network for systematic time-accuracy tradeoff. In 2020 57th ACM/IEEE Design Automation Conference (DAC), pages 1–6, 2020.
  29. M. Phuong and C. H. Lampert. Distillation-based training for multi-exit architectures. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1355–1364, 2019.
  30. Branchynet: Fast inference via early exiting from deep neural networks. International Conference on Pattern Recognition (ICPR), pages 2464–2469, 2016.
  31. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research, 11(12), 2010.
  32. Evaluating the energy consumption of openmp applications on haswell processors. In C. Terboven, B. R. de Supinski, P. Reble, B. M. Chapman, and M. S. Müller, editors, OpenMP: Heterogenous Execution and Data Movements, pages 233–246, Cham, 2015. Springer International Publishing.
  33. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. ArXiv, abs/1708.07747, 2017.
  34. Resolution adaptive networks for efficient inference. 2020 IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR), pages 2366–2375, 2020.
  35. Designing energy-efficient convolutional neural networks using energy-aware pruning. In IEEE Conf on Computer Vision and Pattern Recognition (CVPR), 2017.
  36. Deepiot: Compressing deep neural network structures for sensing systems with a compressor-critic framework. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems, SenSys ’17, New York, NY, USA, 2017. Association for Computing Machinery.
  37. Gillis: Serving large neural networks in serverless functions with automatic model partitioning. In 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS), pages 138–148, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hasanul Mahmud (2 papers)
  2. Peng Kang (14 papers)
  3. Kevin Desai (9 papers)
  4. Palden Lama (2 papers)
  5. Sushil Prasad (6 papers)
Citations (2)