Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Progressive Neural Compression for Adaptive Image Offloading under Timing Constraints (2310.05306v1)

Published 8 Oct 2023 in cs.DC, cs.CV, and cs.LG

Abstract: IoT devices are increasingly the source of data for ML applications running on edge servers. Data transmissions from devices to servers are often over local wireless networks whose bandwidth is not just limited but, more importantly, variable. Furthermore, in cyber-physical systems interacting with the physical environment, image offloading is also commonly subject to timing constraints. It is, therefore, important to develop an adaptive approach that maximizes the inference performance of ML applications under timing constraints and the resource constraints of IoT devices. In this paper, we use image classification as our target application and propose progressive neural compression (PNC) as an efficient solution to this problem. Although neural compression has been used to compress images for different ML applications, existing solutions often produce fixed-size outputs that are unsuitable for timing-constrained offloading over variable bandwidth. To address this limitation, we train a multi-objective rateless autoencoder that optimizes for multiple compression rates via stochastic taildrop to create a compression solution that produces features ordered according to their importance to inference performance. Features are then transmitted in that order based on available bandwidth, with classification ultimately performed using the (sub)set of features received by the deadline. We demonstrate the benefits of PNC over state-of-the-art neural compression approaches and traditional compression methods on a testbed comprising an IoT device and an edge server connected over a wireless network with varying bandwidth.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
  2. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
  3. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  4. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  5. A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE international conference on acoustics, speech and signal processing.   Ieee, 2013, pp. 6645–6649.
  6. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.
  7. S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.
  8. A. J. Hussain, A. Al-Fayadh, and N. Radi, “Image compression techniques: A survey in lossless and lossy algorithms,” Neurocomputing, vol. 300, pp. 44–69, 2018.
  9. A. E. Eshratifar, A. Esmaili, and M. Pedram, “Bottlenet: A deep learning architecture for intelligent mobile cloud computing services,” in 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).   IEEE, 2019, pp. 1–6.
  10. S. Yao, J. Li, D. Liu, T. Wang, S. Liu, H. Shao, and T. Abdelzaher, “Deep compressive offloading: Speeding up neural network inference by trading edge computation for network latency,” in Proceedings of the 18th Conference on Embedded Networked Sensor Systems, 2020, pp. 476–488.
  11. Y. Matsubara, D. Callegaro, S. Baidya, M. Levorato, and S. Singh, “Head network distillation: Splitting distilled deep neural networks for resource-constrained edge computing systems,” IEEE Access, vol. 8, pp. 212 177–212 193, 2020.
  12. J. Shao and J. Zhang, “Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems,” in 2020 IEEE International Conference on Communications Workshops (ICC Workshops).   IEEE, 2020, pp. 1–6.
  13. Y. Matsubara, R. Yang, M. Levorato, and S. Mandt, “Supervised compression for resource-constrained edge computing systems,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 2685–2695.
  14. A. Kulkarni, A. Seetharam, A. Ramesh, and J. Herath, “Deepchannel: Wireless channel quality prediction using deep learning,” IEEE Transactions on Vehicular Technology, vol. 69, no. 1, pp. 443–456, January 2020.
  15. P. J. Mateo, C. Fiandrino, and J. Widmer, “Analysis of tcp performance in 5g mm-wave mobile networks,” in ICC 2019-2019 IEEE International Conference on Communications (ICC).   IEEE, 2019, pp. 1–7.
  16. T. Koike-Akino and Y. Wang, “Stochastic bottleneck: Rateless auto-encoder for flexible dimensionality reduction,” in 2020 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2020, pp. 2735–2740.
  17. A. Cerpa, J. L. Wong, M. Potkonjak, and D. Estrin, “Temporal properties of low power wireless links: Modeling and implications on multi-hop routing,” in Proceedings of the 6th ACM International Symposium on Mobile Ad Hoc Networking and Computing, ser. MobiHoc ’05.   New York, NY, USA: Association for Computing Machinery, 2005, p. 414–425. [Online]. Available: https://doi.org/10.1145/1062689.1062741
  18. K. Srinivasan, P. Dutta, A. Tavakoli, and P. Levis, “An empirical study of low-power wireless,” ACM Trans. Sen. Netw., vol. 6, no. 2, mar 2010. [Online]. Available: https://doi.org/10.1145/1689239.1689246
  19. J. Zhao and R. Govindan, “Understanding packet delivery performance in dense wireless sensor networks,” in Proceedings of the 1st International Conference on Embedded Networked Sensor Systems, ser. SenSys ’03.   New York, NY, USA: Association for Computing Machinery, 2003, p. 1–13. [Online]. Available: https://doi.org/10.1145/958491.958493
  20. G. K. Wallace, “The jpeg still picture compression standard,” IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.
  21. Google, “WebP compression technique,” https://developers.google.com/speed/webp, 2010, accessed: 2022-03-15.
  22. L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” arXiv preprint arXiv:1703.00395, 2017.
  23. J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” arXiv preprint arXiv:1802.01436, 2018.
  24. D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” Advances in neural information processing systems, vol. 31, 2018.
  25. Y. Lu, Y. Zhu, Y. Yang, A. Said, and T. S. Cohen, “Progressive neural image compression with nested quantization and latent ordering,” in 2021 IEEE International Conference on Image Processing (ICIP).   IEEE, 2021, pp. 539–543.
  26. G. Toderici, S. M. O’Malley, S. J. Hwang, D. Vincent, D. Minnen, S. Baluja, M. Covell, and R. Sukthankar, “Variable rate image compression with recurrent neural networks,” arXiv preprint arXiv:1511.06085, 2015.
  27. G. Toderici, D. Vincent, N. Johnston, S. Jin Hwang, D. Minnen, J. Shor, and M. Covell, “Full resolution image compression with recurrent neural networks,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 5306–5314.
  28. Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” ACM SIGARCH Computer Architecture News, vol. 45, no. 1, pp. 615–629, 2017.
  29. A. E. Eshratifar, M. S. Abrishami, and M. Pedram, “Jointdnn: an efficient training and inference engine for intelligent mobile cloud computing services,” IEEE Transactions on Mobile Computing, vol. 20, no. 2, pp. 565–576, 2019.
  30. Y. Matsubara, S. Baidya, D. Callegaro, M. Levorato, and S. Singh, “Distilled split deep neural networks for edge-assisted real-time systems,” in Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges, 2019, pp. 21–26.
  31. A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” arXiv preprint arXiv:1612.00410, 2016.
  32. S. Singh, S. Abu-El-Haija, N. Johnston, J. Ballé, A. Shrivastava, and G. Toderici, “End-to-end learning of compressible features,” in 2020 IEEE International Conference on Image Processing (ICIP).   IEEE, 2020, pp. 3349–3353.
  33. G. Hinton, O. Vinyals, J. Dean et al., “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
  34. D. A. Huffman, “A method for the construction of minimum-redundancy codes,” Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 1952.
  35. A. Polino, R. Pascanu, and D. Alistarh, “Model compression via distillation and quantization,” arXiv preprint arXiv:1802.05668, 2018.
  36. Y. Zhou, S.-M. Moosavi-Dezfooli, N.-M. Cheung, and P. Frossard, “Adaptive quantization for deep neural network,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
  37. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning.   PMLR, 2019, pp. 6105–6114.
  38. A. Chakrabarti, R. Guérin, C. Lu, and J. Liu, “Real-time edge classification: Optimal offloading under token bucket constraints,” in Proc. IEEE/ACM Symposium on Edge Computing (SEC), 2021.
  39. J. Qiu, R. Wang, A. Chakrabarti, R. Guérin, and C. Lu, “Adaptive edge offloading for image classification under rate limit,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.