NU-Class Net: A Novel Approach for Video Quality Enhancement (2401.01163v3)
Abstract: Video content has experienced a surge in popularity, asserting its dominance over internet traffic and Internet of Things (IoT) networks. Video compression has long been regarded as the primary means of efficiently managing the substantial multimedia traffic generated by video-capturing devices. Nevertheless, video compression algorithms entail significant computational demands in order to achieve substantial compression ratios. This complexity presents a formidable challenge when implementing efficient video coding standards in resource-constrained embedded systems, such as IoT edge node cameras. To tackle this challenge, this paper introduces NU-Class Net, an innovative deep-learning model designed to mitigate compression artifacts stemming from lossy compression codecs. This enhancement significantly elevates the perceptible quality of low-bit-rate videos. By employing the NU-Class Net, the video encoder within the video-capturing node can reduce output quality, thereby generating low-bit-rate videos and effectively curtailing both computation and bandwidth requirements at the edge. On the decoder side, which is typically less encumbered by resource limitations, NU-Class Net is applied after the video decoder to compensate for artifacts and approximate the quality of the original video. Experimental results affirm the efficacy of the proposed model in enhancing the perceptible quality of videos, especially those streamed at low bit rates.
- U. Cisco, “Cisco annual internet report (2018–2023) white paper,” Cisco: San Jose, CA, USA, 2020.
- F. Bhering, D. Passos, L. S. Ochi, K. Obraczka, and C. Albuquerque, “Wireless multipath video transmission: when iot video applications meet networking—a survey,” Multimedia Systems, vol. 28, no. 3, pp. 831–850, 2022.
- C. W. Chen, “Internet of video things: Next-generation iot with visual sensors,” IEEE Internet of Things Journal, vol. 7, no. 8, pp. 6676–6685, 2020.
- W. Wang, Q. Wang, and K. Sohraby, “Multimedia sensing as a service (msaas): Exploring resource saving potentials of at cloud-edge iot and fogs,” IEEE Internet of Things Journal, vol. 4, no. 2, pp. 487–495, 2017.
- “Why your internet habits are not as clean as you think.” [Online]. Available: https://www.bbc.com/future/article/20200305-why-your-internet-habits-are-not-as-clean-as-you-think
- J. Chen, Y. Ye, and S. Kim, “Algorithm description for versatile video coding and test model 1 (vtm 1),” Joint Video Experts Team (JVET) of ITU-T SG, vol. 16, pp. 10–20, 2019.
- C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4681–4690.
- D. Maleki, S. Nadalian, M. Mahdi Derakhshani, and M. Amin Sadeghi, “Blockcnn: A deep network for artifact removal and image compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 2555–2558.
- A. Golinski, R. Pourreza, Y. Yang, G. Sautiere, and T. S. Cohen, “Feedback recurrent autoencoder for video compression,” in Proceedings of the Asian Conference on Computer Vision, 2020.
- R. Pourreza and T. Cohen, “Extending neural p-frame codecs for b-frame coding,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6680–6689.
- A. Majumdar, K. Ramchandran, M. Tagliasacchi et al., “A distributed-source-coding based robust spatio-temporal scalable video codec,” in Picture Coding Symposium (PCS 2004), 2004, pp. 473–478.
- T. van Rozendaal, J. Brehmer, Y. Zhang, R. Pourreza, and T. S. Cohen, “Instance-adaptive video compression: Improving neural codecs by training on the test set,” arXiv preprint arXiv:2111.10302, 2021.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241.
- F. Vaccaro, M. Bertini, T. Uricchio, and A. Del Bimbo, “Fast video visual quality and resolution improvement using sr-unet,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 1221–1229.
- H. Noura, J. Azar, O. Salman, R. Couturier, and K. Mazouzi, “A deep learning scheme for efficient multimedia IoT data compression,” Ad Hoc Networks, vol. 138, p. 102998 (15), 2023. [Online]. Available: https://hal.science/hal-04224626
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 286–301.
- B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
- L. Zhuo, G. Wang, S. Li, W. Wu, and Z. Liu, “Fast-vid2vid: Spatial-temporal compression for video-to-video synthesis,” in European Conference on Computer Vision. Springer, 2022, pp. 289–305.
- G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, 2012.
- W. Luo, Y. Li, R. Urtasun, and R. Zemel, “Understanding the effective receptive field in deep convolutional neural networks,” Advances in neural information processing systems, vol. 29, 2016.
- D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Instance normalization: The missing ingredient for fast stylization,” arXiv preprint arXiv:1607.08022, 2016.
- Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” arXiv preprint arXiv:2011.13456, 2020.
- “Nvidia tesla p100: The most advanced data center accelerator.” [Online]. Available: https://www.nvidia.com/en-us/data-center/tesla-p100/
- “Nvidia tesla v100.” [Online]. Available: https://www.nvidia.com/en-gb/data-center/tesla-v100/
- “Nvidia a100 gpus power the modern data center.” [Online]. Available: https://www.nvidia.com/en-us/datacenter/a100/
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
- [Online]. Available: https://ffmpeg.org/
- S. Tomar, “Converting video formats with ffmpeg,” Linux Journal, vol. 2006, no. 146, p. 10, 2006.
- N. Neda, S. Ullah, A. Ghanbari, H. Mahdiani, M. Modarressi, and A. Kumar, “Multi-precision deep neural network acceleration on fpgas,” in 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), 2022, pp. 454–459.
- Parham Zilouchian Moghaddam (1 paper)
- Mehdi Modarressi (2 papers)
- Mohammad Amin Sadeghi (13 papers)