Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation
Abstract: Offloading computing to edge servers is a promising solution to support growing video understanding applications at resource-constrained IoT devices. Recent efforts have been made to enhance the scalability of such systems by reducing inference costs on edge servers. However, existing research is not directly applicable to pixel-level vision tasks such as video semantic segmentation (VSS), partly due to the fluctuating VSS accuracy and segment bitrate caused by the dynamic video content. In response, we present Penance, a new edge inference cost reduction framework. By exploiting softmax outputs of VSS models and the prediction mechanism of H.264/AVC codecs, Penance optimizes model selection and compression settings to minimize the inference cost while meeting the required accuracy within the available bandwidth constraints. We implement Penance in a commercial IoT device with only CPUs. Experimental results show that Penance consumes a negligible 6.8% more computation resources than the optimal strategy while satisfying accuracy and bandwidth constraints with a low failure rate.
- [n. d.]. Video Analytics Market Size, Growth | Global Report [2022-2029]. https://www.fortunebusinessinsights.com/industry-reports/video-analytics-market-101114
- 2023. encoder/me.c · master · VideoLAN / x264 · GitLab. https://code.videolan.org/videolan/x264/-/blob/master/encoder/me.c
- Anil Chandra, Naidu Matcha. [n. d.]. A 2021 guide to Semantic Segmentation. https://nanonets.com/blog/semantic-image-segmentation-2020/
- Eff-UNet: A Novel Architecture for Semantic Segmentation in Unstructured Environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
- Drone Image Segmentation Using Machine and Deep Learning for Mapping Raised Bog Vegetation Communities. Remote Sensing 12, 16 (Jan. 2020), 2602. https://doi.org/10.3390/rs12162602 Number: 16 Publisher: Multidisciplinary Digital Publishing Institute.
- Balancing constraints and rewards with meta-gradient d4pg. arXiv preprint arXiv:2010.06324 (2020).
- Structured Bird’s-Eye-View Traffic Scene Understanding From Onboard Images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 15661–15670.
- Scaling Video Analytics on Constrained Edge Nodes. In SysML. 12.
- DroneSegNet: Robust Aerial Semantic Segmentation for UAV-Based IoT Applications. IEEE Transactions on Vehicular Technology 71, 4 (April 2022), 4277–4286. https://doi.org/10.1109/TVT.2022.3144358 Conference Name: IEEE Transactions on Vehicular Technology.
- Deep spatio-temporal random fields for efficient video segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8915–8924.
- Importance-Aware Semantic Segmentation for Autonomous Vehicles. IEEE Transactions on Intelligent Transportation Systems 20, 1 (Jan. 2019), 137–148. https://doi.org/10.1109/TITS.2018.2801309 Conference Name: IEEE Transactions on Intelligent Transportation Systems.
- Glimpse: Continuous, Real-Time Object Recognition on Mobile Devices. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. ACM, Seoul South Korea, 155–168. https://doi.org/10.1145/2809695.2809711
- Konstantin Ditschuneit and Johannes S. Otterbach. 2022. Auto-Compressing Subset Pruning for Semantic Image Segmentation. https://doi.org/10.48550/arXiv.2201.11103 arXiv:2201.11103 [cs, stat].
- Server-Driven Video Streaming for Deep Learning Inference. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. ACM, Virtual Event USA, 557–570. https://doi.org/10.1145/3387514.3405887
- Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges. IEEE Transactions on Intelligent Transportation Systems 22, 3 (March 2021), 1341–1360. https://doi.org/10.1109/TITS.2020.2972974 Conference Name: IEEE Transactions on Intelligent Transportation Systems.
- Semantic video cnns through representation warping. In Proceedings of the IEEE International Conference on Computer Vision. 4453–4462.
- Convolutional Neural Network Algorithms for Semantic Segmentation of Volcanic Ash Plumes Using Visible Camera Imagery. Remote Sensing 14, 18 (Jan. 2022), 4477. https://doi.org/10.3390/rs14184477 Number: 18 Publisher: Multidisciplinary Digital Publishing Institute.
- FoggyCache: Cross-Device Approximate Computation Reuse. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom ’18). Association for Computing Machinery, New York, NY, USA, 19–34. https://doi.org/10.1145/3241539.3241557
- Dan Hendrycks and Kevin Gimpel. 2018. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks. https://doi.org/10.48550/arXiv.1610.02136 arXiv:1610.02136 [cs].
- Temporally distributed networks for fast video semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8818–8827.
- Chameleon: scalable adaptation of video analytics. In SIGCOM (SIGCOMM ’18). Association for Computing Machinery, New York, NY, USA, 253–266. https://doi.org/10.1145/3230543.3230574
- Joint Model and Data Adaptation for Cloud Inference Serving. In 2021 IEEE Real-Time Systems Symposium (RTSS). 279–289. https://doi.org/10.1109/RTSS52674.2021.00034 ISSN: 2576-3172.
- Video scene parsing with predictive feature learning. In Proceedings of the IEEE International Conference on Computer Vision. 5580–5588.
- NoScope: optimizing neural network queries over video at scale. Proceedings of the VLDB Endowment 10, 11 (Aug. 2017), 1586–1597. https://doi.org/10.14778/3137628.3137664
- Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication (SIGCOMM ’20). Association for Computing Machinery, New York, NY, USA, 359–376. https://doi.org/10.1145/3387514.3405874
- Low-latency video semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5997–6005.
- AdaMask: Enabling Machine-Centric Video Streaming with Adaptive Frame Masking for DNN Inference Offloading. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 3035–3044. https://doi.org/10.1145/3503161.3548033
- Raspberry Pi Ltd. [n. d.]. Raspberry Pi 4 Model B specifications. https://www.raspberrypi.com/products/raspberry-pi-4-model-b/specifications/
- Budget-aware deep semantic video segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1029–1038.
- MuZero with Self-competition for Rate Control in VP9 Video Compression. https://doi.org/10.48550/arXiv.2202.06626 arXiv:2202.06626 [cs, eess].
- Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928–1937.
- Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing 493 (July 2022), 626–646. https://doi.org/10.1016/j.neucom.2022.01.005
- Deep learning semantic segmentation for water level estimation using surveillance camera. Applied Sciences 11, 20 (2021), 9691.
- DAO: Dynamic Adaptive Offloading for Video Analytics. In Proceedings of the 30th ACM International Conference on Multimedia (MM ’22). Association for Computing Machinery, New York, NY, USA, 3017–3025. https://doi.org/10.1145/3503161.3548249
- David Nilsson and Cristian Sminchisescu. 2018. Semantic video segmentation by gated recurrent flow propagation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6819–6828.
- Constrained reinforcement learning has zero duality gap. Advances in Neural Information Processing Systems 32 (2019).
- Mixture Domain Adaptation To Improve Semantic Segmentation in Real-World Surveillance. 22–31. https://openaccess.thecvf.com/content/WACV2023W/RWS/html/Pierard_Mixture_Domain_Adaptation_To_Improve_Semantic_Segmentation_in_Real-World_Surveillance_WACVW_2023_paper.html
- Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
- Proximal Policy Optimization Algorithms. In arXiv:1707.06347 [cs]. http://arxiv.org/abs/1707.06347 arXiv: 1707.06347.
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
- Suramya Tomar. 2006. Converting video formats with FFmpeg. Linux Journal 2006, 146 (2006), 10.
- Joint Configuration Adaptation and Bandwidth Allocation for Edge-based Real-time Video Analytics. In IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. 257–266. https://doi.org/10.1109/INFOCOM41043.2020.9155524 ISSN: 2641-9874.
- SwiftNet: Real-time Video Object Segmentation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Nashville, TN, USA, 1296–1305. https://doi.org/10.1109/CVPR46437.2021.00135
- DNN-Driven Compressive Offloading for Edge-Assisted Semantic Video Segmentation. In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 10.
- BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation. 325–341. https://openaccess.thecvf.com/content_ECCV_2018/html/Changqian_Yu_BiSeNet_Bilateral_Segmentation_ECCV_2018_paper.html
- BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- InFi: end-to-end learnable input filter for resource-efficient mobile-centric inference. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking (MobiCom ’22). Association for Computing Machinery, New York, NY, USA, 228–241. https://doi.org/10.1145/3495243.3517016
- AWStream: adaptive wide-area streaming analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM ’18). Association for Computing Machinery, New York, NY, USA, 236–252. https://doi.org/10.1145/3230543.3230554
- Eric Zhang. 2023. Fast Semantic Segmentation. https://github.com/ekzhang/fastseg original-date: 2020-07-22T22:11:27Z.
- Edge Assisted Real-time Instance Segmentation on Mobile Devices. In 2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS). 537–547. https://doi.org/10.1109/ICDCS54860.2022.00058 ISSN: 2575-8411.
- CASVA: Configuration-Adaptive Streaming for Live Video Analytics. In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. IEEE, London, United Kingdom, 2168–2177. https://doi.org/10.1109/INFOCOM48880.2022.9796875
- Adaptive Configuration Selection and Bandwidth Allocation for Edge-Based Video Analytics. IEEE/ACM Transactions on Networking 30, 1 (Feb. 2022), 285–298. https://doi.org/10.1109/TNET.2021.3106937 Conference Name: IEEE/ACM Transactions on Networking.
- ICNet for Real-Time Semantic Segmentation on High-Resolution Images. https://doi.org/10.48550/arXiv.1704.08545 arXiv:1704.08545 [cs].
- Pyramid Scene Parsing Network. https://doi.org/10.48550/arXiv.1612.01105 arXiv:1612.01105 [cs].
- Deep feature flow for video recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2349–2358.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.