FrankenSplit: Efficient Neural Feature Compression with Shallow Variational Bottleneck Injection for Mobile Edge Computing (2302.10681v4)
Abstract: The rise of mobile AI accelerators allows latency-sensitive applications to execute lightweight Deep Neural Networks (DNNs) on the client side. However, critical applications require powerful models that edge devices cannot host and must therefore offload requests, where the high-dimensional data will compete for limited bandwidth. This work proposes shifting away from focusing on executing shallow layers of partitioned DNNs. Instead, it advocates concentrating the local resources on variational compression optimized for machine interpretability. We introduce a novel framework for resource-conscious compression models and extensively evaluate our method in an environment reflecting the asymmetric resource distribution between edge devices and servers. Our method achieves 60% lower bitrate than a state-of-the-art SC method without decreasing accuracy and is up to 16x faster than offloading with existing codec standards.
- A. Voulodimos, N. Doulamis, A. Doulamis, E. Protopapadakis, et al., “Deep learning for computer vision: A brief review,” Computational intelligence and neuroscience, vol. 2018, 2018.
- D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE transactions on neural networks and learning systems, vol. 32, no. 2, pp. 604–624, 2020.
- C. Feng, P. Han, X. Zhang, B. Yang, Y. Liu, and L. Guo, “Computation offloading in mobile edge computing networks: A survey,” Journal of Network and Computer Applications, p. 103366, 2022.
- T. Rausch, W. Hummer, C. Stippel, S. Vasiljevic, C. Elvezio, S. Dustdar, and K. Krösl, “Towards a platform for smart city-scale cognitive assistance applications,” in 2021 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 330–335, 2021.
- R. R. Arinta and E. Andi W.R., “Natural disaster application on big data and machine learning: A review,” in 2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), pp. 249–254, 2019.
- Q. Xin, M. Alazab, V. G. Díaz, C. E. Montenegro-Marin, and R. G. Crespo, “A deep learning architecture for power management in smart cities,” Energy Reports, vol. 8, pp. 1568–1577, 2022.
- U. Cisco, “Cisco annual internet report (2018–2023) white paper,” Cisco: San Jose, CA, USA, vol. 10, no. 1, pp. 1–35, 2020.
- Q. Zhang, X. Li, X. Che, X. Ma, A. Zhou, M. Xu, S. Wang, Y. Ma, and X. Liu, “A comprehensive benchmark of deep learning libraries on mobile devices,” in Proceedings of the ACM Web Conference 2022, pp. 3298–3307, 2022.
- Q. Zhang, X. Che, Y. Chen, X. Ma, M. Xu, S. Dustdar, X. Liu, and S. Wang, “A comprehensive deep learning library benchmark and optimal library selection,” IEEE Transactions on Mobile Computing, 2023.
- F. Romero, Q. Li, N. J. Yadwadkar, and C. Kozyrakis, “INFaaS: Automated model-less inference serving,” in 2021 USENIX Annual Technical Conference (USENIX ATC 21), pp. 397–411, USENIX Association, July 2021.
- Y. Matsubara, M. Levorato, and F. Restuccia, “Split computing and early exiting for deep learning applications: Survey and research challenges,” ACM Computing Surveys, vol. 55, no. 5, pp. 1–30, 2022.
- N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” 2000.
- Y. Yang, S. Mandt, and L. Theis, “An introduction to neural data compression,” 2022.
- G. LLC, “An image format for the web.”
- V. Goyal, “Theoretical foundations of transform coding,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 9–21, 2001.
- J. Ballé, P. A. Chou, D. Minnen, S. Singh, N. Johnston, E. Agustsson, S. J. Hwang, and G. Toderici, “Nonlinear transform coding,” CoRR, vol. abs/2007.03034, 2020.
- J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net, 2017.
- J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” 2018.
- D. Minnen, J. Ballé, and G. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” 2018.
- S. Singh, S. Abu-El-Haija, N. Johnston, J. Ballé, A. Shrivastava, and G. Toderici, “End-to-end learning of compressible features,” CoRR, vol. abs/2007.11797, 2020.
- Y. Dubois, B. Bloem-Reddy, K. Ullrich, and C. J. Maddison, “Lossy compression for lossless prediction,” Advances in Neural Information Processing Systems, vol. 34, pp. 14014–14028, 2021.
- Y. Matsubara, R. Yang, M. Levorato, and S. Mandt, “SC2 Benchmark: Supervised Compression for Split Computing,” Transactions on Machine Learning Research, 2023.
- Y. Matsubara, R. Yang, M. Levorato, and S. Mandt, “Supervised compression for resource-constrained edge computing systems,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2685–2695, 2022.
- Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang, “Neurosurgeon: Collaborative intelligence between the cloud and mobile edge,” ACM SIGARCH Computer Architecture News, vol. 45, no. 1, pp. 615–629, 2017.
- H. Li, C. Hu, J. Jiang, Z. Wang, Y. Wen, and W. Zhu, “Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution,” in 2018 IEEE 24th international conference on parallel and distributed systems (ICPADS), pp. 671–678, IEEE, 2018.
- S. Laskaridis, S. I. Venieris, M. Almeida, I. Leontiadis, and N. D. Lane, “Spinn: synergistic progressive inference of neural networks over device and cloud,” in Proceedings of the 26th annual international conference on mobile computing and networking, pp. 1–15, 2020.
- M. Almeida, S. Laskaridis, S. I. Venieris, I. Leontiadis, and N. D. Lane, “Dyno: Dynamic onloading of deep neural networks from cloud to device,” ACM Transactions on Embedded Computing Systems, vol. 21, no. 6, pp. 1–24, 2022.
- H. Liu, W. Zheng, L. Li, and M. Guo, “Loadpart: Load-aware dynamic partition of deep neural networks for edge offloading,” in 2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS), pp. 481–491, 2022.
- A. Bakhtiarnia, N. Milošević, Q. Zhang, D. Bajović, and A. Iosifidis, “Dynamic split computing for efficient deep edge intelligence,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, 2023.
- A. E. Eshratifar, A. Esmaili, and M. Pedram, “Bottlenet: A deep learning architecture for intelligent mobile cloud computing services,” in 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), pp. 1–6, 2019.
- J. Shao and J. Zhang, “Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems,” in 2020 IEEE International Conference on Communications Workshops (ICC Workshops), pp. 1–6, 2020.
- Y. Matsubara, S. Baidya, D. Callegaro, M. Levorato, and S. Singh, “Distilled split deep neural networks for edge-assisted real-time systems,” in Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges, HotEdgeVideo’19, (New York, NY, USA), p. 21–26, Association for Computing Machinery, 2019.
- M. Sbai, M. R. U. Saputra, N. Trigoni, and A. Markham, “Cut, distil and encode (cde): Split cloud-edge deep inference,” in 2021 18th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), pp. 1–9, 2021.
- L. Deng, G. Li, S. Han, L. Shi, and Y. Xie, “Model compression and hardware acceleration for neural networks: A comprehensive survey,” Proceedings of the IEEE, vol. 108, no. 4, pp. 485–532, 2020.
- Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE transactions on neural networks and learning systems, 2021.
- K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, et al., “A survey on vision transformer,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 1, pp. 87–110, 2022.
- F. Romero, Q. Li, N. J. Yadwadkar, and C. Kozyrakis, “Infaas: Automated model-less inference serving.,” in USENIX Annual Technical Conference, pp. 397–411, 2021.
- K. Zhao, Z. Zhou, X. Chen, R. Zhou, X. Zhang, S. Yu, and D. Wu, “Edgeadaptor: Online configuration adaption, model selection and resource provisioning for edge dnn inference serving at scale,” IEEE Transactions on Mobile Computing, pp. 1–16, 2022.
- S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, pp. 448–456, pmlr, 2015.
- S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500, 2017.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015.
- A. Krizhevsky, G. Hinton, et al., “Learning multiple layers of features from tiny images,” 2009.
- R. Shwartz-Ziv and N. Tishby, “Opening the black box of deep neural networks via information,” arXiv preprint arXiv:1703.00810, 2017.
- C. E. Shannon, “Coding theorems for a discrete source with a fidelity criterion,” in IRE National Convention Record, 1959, vol. 4, pp. 142–163, 1959.
- T. Berger, “Rate distortion theory for sources with abstract alphabets and memory,” Information and Control, vol. 13, no. 3, pp. 254–273, 1968.
- N. Tishby and N. Zaslavsky, “Deep learning and the information bottleneck principle,” in 2015 ieee information theory workshop (itw), pp. 1–5, IEEE, 2015.
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” 2015.
- L. Wang and K.-J. Yoon, “Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “Fitnets: Hints for thin deep nets,” arXiv preprint arXiv:1412.6550, 2014.
- B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929, 2016.
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” International Journal of Computer Vision, vol. 128, pp. 336–359, oct 2019.
- J. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving for simplicity: The all convolutional net,” in ICLR (workshop track), 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learniserverng for image recognition,” CoRR, vol. abs/1512.03385, 2015.
- S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022, 2021.
- J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1833–1844, 2021.
- R. Wightman, “Pytorch image models.” https://github.com/rwightman/pytorch-image-models, 2019.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- A. D. I. Pytorch, “Pytorch,” 2018.
- J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja, “Compressai: a pytorch library and evaluation platform for end-to-end compression research,” arXiv preprint arXiv:2011.03029, 2020.
- J. Gildenblat and contributors, “Pytorch library for cam methods.” https://github.com/jacobgil/pytorch-grad-cam, 2021.
- Y. Matsubara, “torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation,” in International Workshop on Reproducible Research in Pattern Recognition, pp. 24–44, Springer, 2021.
- Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986, 2022.
- Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” 2020.
- T. Chen, H. Liu, Z. Ma, Q. Shen, X. Cao, and Y. Wang, “End-to-end learnt image compression via non-local attention optimization and improved context modeling,” IEEE Transactions on Image Processing, vol. 30, pp. 3179–3191, 2021.
- M. Lu, P. Guo, H. Shi, C. Cao, and Z. Ma, “Transformer-based image compression,” in 2022 Data Compression Conference (DCC), pp. 469–469, 2022.
- X. Luo, H. Talebi, F. Yang, M. Elad, and P. Milanfar, “The rate-distortion-accuracy tradeoff: Jpeg case study,” arXiv preprint arXiv:2008.00605, 2020.
- L. Bossard, M. Guillaumin, and L. Van Gool, “Food-101–mining discriminative components with random forests,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VI 13, pp. 446–461, Springer, 2014.
- M.-E. Nilsback and A. Zisserman, “Automated flower classification over a large number of classes,” in 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729, 2008.