Task-Oriented Communication for Edge Video Analytics (2211.14049v3)
Abstract: With the development of AI techniques and the increasing popularity of camera-equipped devices, many edge video analytics applications are emerging, calling for the deployment of computation-intensive AI models at the network edge. Edge inference is a promising solution to move the computation-intensive workloads from low-end devices to a powerful edge server for video analytics, but the device-server communications will remain a bottleneck due to the limited bandwidth. This paper proposes a task-oriented communication framework for edge video analytics, where multiple devices collect the visual sensory data and transmit the informative features to an edge server for processing. To enable low-latency inference, this framework removes video redundancy in spatial and temporal domains and transmits minimal information that is essential for the downstream task, rather than reconstructing the videos at the edge server. Specifically, it extracts compact task-relevant features based on the deterministic information bottleneck (IB) principle, which characterizes a tradeoff between the informativeness of the features and the communication cost. As the features of consecutive frames are temporally correlated, we propose a temporal entropy model (TEM) to reduce the bitrate by taking the previous features as side information in feature encoding. To further improve the inference performance, we build a spatial-temporal fusion module at the server to integrate features of the current and previous frames for joint inference. Extensive experiments on video analytics tasks evidence that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods.
- M.-C. Chang, C.-K. Chiang, C.-M. Tsai, Y.-K. Chang, H.-L. Chiang, Y.-A. Wang, S.-Y. Chang, Y.-L. Li, M.-S. Tsai, and H.-Y. Tseng, “AI city challenge 2020-computer vision for smart transportation applications,” in Proc. Conf. Comput. Vision Pattern Recognit. Workshop, Jun. 2020. [Online]. Available: https://ieeexplore.ieee.org/document/9151055.
- N. A. Othman and I. Aydin, “A new iot combined body detection of people by using computer vision for security application,” in Proc. Int. Conf. Comput. Intell. Commun. Netw., Ercan, North Cyprus, Sep. 2017.
- J. Gao, Y. Yang, P. Lin, and D. S. Park, “Computer vision in healthcare applications,” J. Healthcare Eng., vol. 2018, Mar. 2018.
- G. Ananthanarayanan, P. Bahl, P. Bodík, K. Chintalapudi, M. Philipose, L. Ravindranath, and S. Sinha, “Real-time video analytics: The killer app for edge computing,” Comput., vol. 50, no. 10, pp. 58–67, Oct. 2017.
- Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge intelligence: Paving the last mile of artificial intelligence with edge computing,” Proc. IEEE, vol. 107, no. 8, pp. 1738–1762, Aug. 2019.
- J. Shao, Y. Mao, and J. Zhang, “Task-oriented communication for multi-device cooperative edge inference,” IEEE Trans. Wireless Commun., Jul. 2022.
- J. Shao, Y. Mao, and J. Zhang, “Learning task-oriented communication for edge inference: An information bottleneck approach,” IEEE J. Sel. Area Commun., vol. 40, no. 1, pp. 197–211, Jan. 2022.
- O. A. Hanna, Y. H. Ezzeldin, T. Sadjadpour, C. Fragouli, and S. Diggavi, “On distributed quantization for classification,” IEEE J. Sel. Areas Inf. Theory, vol. 1, no. 1, pp. 237–249, Apr. 2020.
- M. Singhal, V. Raghunathan, and A. Raghunathan, “Communication-efficient view-pooling for distributed multi-view neural networks,” in Proc. Design Automat. Test Eur. Conf. Exhib. (DATE), Grenoble, France, Sep. 2020.
- J. Choi, Z. Hakimi, P. W. Shin, J. Sampson, and V. Narayanan, “Context-aware convolutional neural network over distributed system in collaborative computing,” in Proc. ACM/IEEE Design Automat. Conf. (DAC), Las Vegas, NV, USA, Jun. 2019.
- Y. Zhou, J. Xiao, Y. Zhou, and G. Loianno, “Multi-robot collaborative perception with graph neural networks,” IEEE Robot. Automat. Lett., Jan. 2022.
- S. Xie, Y. Wu, S. Ma, M. Ding, Y. Shi, and M. Tang, “Robust information bottleneck for task-oriented communication with digital modulation,” arXiv preprint arXiv:2209.10382, 2022. [Online]. Available: https://arxiv.org/abs/2209.10382.
- N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” in Proc. Annu. Allerton Conf. Commun. Control Comput., Monticello, IL, USA, Oct. 2000.
- M. Sundermeyer, R. Schlüter, and H. Ney, “LSTM neural networks for language modeling,” in Annu. Conf. of Int. Speech Commun. Assoc., Portland, OR, USA., Sep. 2012.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Adv. Neural Inf. Process. Syst., vol. 30, Dec. 2017.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, NV, USA, Jun. 2016, pp. 770–778.
- E. Ristani and C. Tomasi, “Features for multi-target multi-camera tracking and re-identification,” in Proc. Conf. Comput. Vision Pattern Recognit., Salt Lake City, UT, USA, Jun. 2018, pp. 6036–6046.
- H. Liu, Y. Tian, Y. Yang, L. Pang, and T. Huang, “Deep relative distance learning: Tell the difference between similar vehicles,” in Proc. Conf. Computer Vision Pattern Recognit., Las Vegas, NV, USA, Jun. 2016, pp. 2167–2175.
- G. Farnebäck, “Two-frame motion estimation based on polynomial expansion,” in Scandinavian Conf. Image Anal., Halmstad, Sweden, Jun.-Jul. 2003, pp. 363–370.
- D. Strouse and D. J. Schwab, “The deterministic information bottleneck,” Neural Comput., vol. 29, no. 6, pp. 1611–1630, Jun. 2017.
- E. C. Strinati and S. Barbarossa, “6G networks: Beyond shannon towards semantic and goal-oriented communications,” Comput. Netw,, vol. 190, p. 107930, May 2021.
- G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, and K. Huang, “Toward an intelligent edge: wireless communication meets machine learning,” IEEE Commun. Mag., vol. 58, no. 1, pp. 19–25, Jan. 2020.
- J. Shao and J. Zhang, “Communication-computation trade-off in resource-constrained edge inference,” IEEE Commun. Mag., vol. 58, no. 12, pp. 20–26, Dec. 2020.
- N. Pappas and M. Kountouris, “Goal-oriented communication for real-time tracking in autonomous systems,” in 2021 IEEE Int. Conf. Auton. Systems (ICAS). Montreal, QC, Canada: IEEE, Aug. 2021, pp. 1–5.
- N. Shlezinger and Y. C. Eldar, “Deep task-based quantization,” Entropy, vol. 23, no. 1, p. 104, Jan. 2021.
- M. Merluzzi, P. Di Lorenzo, and S. Barbarossa, “Wireless edge machine learning: Resource allocation and trade-offs,” IEEE Access, vol. 9, pp. 45 377–45 398, Mar. 2021.
- H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,” IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, Apr. 2021.
- E. Bourtsoulatze, D. B. Kurka, and D. Gündüz, “Deep joint source-channel coding for wireless image transmission,” IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 3, pp. 567–579, May 2019.
- J. Shao and J. Zhang, “Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems,” in Proc. IEEE Int. Conf. Commun. Workshop, Dublin, Ireland, Jun. 2020.
- X. Kang, B. Song, J. Guo, Z. Qin, and F. R. Yu, “Task-oriented image transmission for scene classification in unmanned aerial systems,” IEEE Trans. Commun., vol. 70, no. 8, pp. 5181–5192, Jun. 2022.
- Y. Dubois, B. Bloem-Reddy, K. Ullrich, and C. J. Maddison, “Lossy compression for lossless prediction,” in Proc. Int. Conf. Learn. Represent. Workshop on Neural Compression, May 2021. [Online]. Available: https://openreview.net/forum?id=GfCs5NhoR8Q.
- H. Li, W. Yu, H. He, J. Shao, S. Song, J. Zhang, and K. B. Letaief, “Task-oriented communication with out-of-distribution detection: An information bottleneck framework,” arXiv preprint arXiv:2305.12423, 2023. [Online]. Available: https://arxiv.org/abs/2305.12423.
- A. E. Eshratifar, A. Esmaili, and M. Pedram, “Bottlenet: A deep learning architecture for intelligent mobile cloud computing services,” 2019. [Online]. Available: https://arxiv.org/abs/1902.01000.
- M. Jankowski, D. Gündüz, and K. Mikolajczyk, “Wireless image retrieval at the edge,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 89–100, May 2021.
- J. Shao, H. Zhang, Y. Mao, and J. Zhang, “Branchy-gnn: A device-edge co-inference framework for efficient point cloud processing,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), Toronto, Canada, Jun. 2021.
- T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h. 264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video technol., vol. 13, no. 7, pp. 560–576, Jul. 2003.
- G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012.
- G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “DVC: An end-to-end deep video compression framework,” in Proc. Conf. Comput. Vision Pattern Recognit., Georgia Tech, GA, USA, Jun. 2019, pp. 11 006–11 015.
- J. Liu, S. Wang, W.-C. Ma, M. Shah, R. Hu, P. Dhawan, and R. Urtasun, “Conditional entropy coding for efficient video compression,” in Eur. Conf. Comput. Vision, Aug. 2020. [Online]. Available: https://link.springer.com/chapter/10.1007/978-3-030-58520-4_27.
- T.-Y. Tung and D. Gündüz, “Deepwive: Deep-learning-aided wireless video transmission,” IEEE J. Sel. Areas Commun., vol. 40, no. 9, pp. 2570–2583, Jul. 2022.
- D. Liu, Y. Li, J. Lin, H. Li, and F. Wu, “Deep learning-based video coding: A review and a case study,” ACM Comput. Surv. (CSUR), vol. 53, no. 1, pp. 1–35, Feb. 2020.
- S. Ma, X. Zhang, C. Jia, Z. Zhao, S. Wang, and S. Wang, “Image and video compression with neural networks: A review,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 6, pp. 1683–1698, Apr. 2019.
- E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, and G. Toderici, “Scale-space flow for end-to-end optimized video compression,” in Proc. Conf. Comput. Vision Pattern Recognit., Jun 2020. [Online]. Available: https://ieeexplore.ieee.org/document/9157366.
- Z. Hu, G. Lu, and D. Xu, “Fvc: A new framework towards deep video compression in feature space,” in Proc. Conf. Comput. Vision Pattern Recognit., Jun 2021, pp. 1502–1511.
- J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” in Proc. Adv. Neural Inf. Process. Syst., Dec. 2021. [Online]. Available: https://openreview.net/forum?id=evqzNxmXsl3.
- E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in Eur. Conf. Comput. Vision, Amsterdam, The Netherlands, Oct. 2016, pp. 17–35.
- Y. Hou, L. Zheng, and S. Gould, “Multiview detection with feature perspective transformation,” in Eur. Conf. Comput. Vision, Aug. 2020. [Online]. Available: https://www.ecva.net/papers/ eccv_2020/papers_ECCV/papers/123520001.pdf.
- K. B. Letaief, Y. Shi, J. Lu, and J. Lu, “Edge artificial intelligence for 6G: Vision, enabling technologies, and applications,” IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 5–36, Nov. 2021.
- J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” in Proc. Int. Conf. Learn. Represent., Vancouver, BC, Canada, 2018.
- J. Ballé, V. Laparra, and E. Simoncelli, “End-to-end optimized image compression,” in Proc. Int. Conf. Learn. Represent., Toulon, France, Apr. 2017.
- D. Minnen, J. Ballé, and G. D. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” in Proc. Neural Inf. Process. Syst., Montreal, QC, Canada, Dec. 2018.
- T. Chavdarova, P. Baqué, S. Bouquet, A. Maksai, C. Jose, T. Bagautdinov, L. Lettry, P. Fua, L. Van Gool, and F. Fleuret, “Wildtrack: A multi-camera HD dataset for dense unscripted pedestrian detection,” in Proc. Conf. Comput. Vision Pattern Recognit., Salt Lake City, UT, USA, Jun. 2018, pp. 5030–5039.
- G. Roig, X. Boix, H. B. Shitrit, and P. Fua, “Conditional random fields for multi-camera object detection,” in Proc. Int. Conf. Comput. Vision, Barcelona, Spain, Nov. 2011, pp. 563–570.
- R. Kasturi, D. Goldgof, P. Soundararajan, V. Manohar, J. Garofolo, R. Bowers, M. Boonstra, V. Korzhova, and J. Zhang, “Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 319–336, Mar. 2008.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Eur. Conf. Comput. Vision, Zurich, Switzerland, Sep. 2014, pp. 740–755.
- Z. Si and K. Shen, “Research on the webp image format,” in Adv. Graph. Commun. Packag. Technol. Mater., Dec. 2015, pp. 271–277.
- J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja, “CompressAI: A pytorch library and evaluation platform for end-to-end compression research,” arXiv preprint arXiv:2011.03029, 2020. [Online]. Available: https://arxiv.org/abs/2011.03029.
- Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one-stage object detection,” in Proc. Int. Conf. Comput. Vision, Oct.-Nov. 2019, pp. 9627–9636.
- T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. Conf. Comput. Vision Pattern Recognit., Honolulu, HI, USA, Jul. 2017, pp. 2117–2125.
- Jiawei Shao (44 papers)
- Xinjie Zhang (26 papers)
- Jun Zhang (1008 papers)