Knowledge Guided Semi-Supervised Learning for Quality Assessment of User Generated Videos (2312.15425v1)
Abstract: Perceptual quality assessment of user generated content (UGC) videos is challenging due to the requirement of large scale human annotated videos for training. In this work, we address this challenge by first designing a self-supervised Spatio-Temporal Visual Quality Representation Learning (ST-VQRL) framework to generate robust quality aware features for videos. Then, we propose a dual-model based Semi Supervised Learning (SSL) method specifically designed for the Video Quality Assessment (SSL-VQA) task, through a novel knowledge transfer of quality predictions between the two models. Our SSL-VQA method uses the ST-VQRL backbone to produce robust performances across various VQA datasets including cross-database settings, despite being learned with limited human annotated videos. Our model improves the state-of-the-art performance when trained only with limited data by around 10%, and by around 15% when unlabelled data is also used in SSL. Source codes and checkpoints are available at https://github.com/Shankhanil006/SSL-VQA.
- Mixmatch: A holistic approach to semi-supervised learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2019. Curran Associates Inc.
- Rirnet: Recurrent-in-recurrent network for video quality assessment. In Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, page 834–842, New York, NY, USA, 2020. Association for Computing Machinery.
- Contrastive self-supervised pre-training for video quality assessment. IEEE Transactions on Image Processing, 31:458–471, 2022.
- A simple framework for contrastive learning of visual representations. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 1597–1607. PMLR, 13–18 Jul 2020.
- Conformer and blind noisy students for improved image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 940–950, 2022.
- A H.264/AVC video database for the evaluation of quality metrics. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 2430–2433, 2010.
- Perceptual quality prediction on authentically distorted images using a bag of features approach. Journal of Vision, 17(1):32–32, 01 2017.
- In-capture mobile video distortions: A study of subjective behavior and objective algorithms. IEEE Transactions on Circuits and Systems for Video Technology, 28(9):2061–2077, 2018.
- Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016.
- The konstanz natural video database (konvid-1k). In 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, 2017.
- Completely blind quality assessment of user generated video content. IEEE Transactions on Image Processing, 31:263–274, 2022.
- J. Korhonen. Two-level approach for no-reference consumer video quality assessment. IEEE Transactions on Image Processing, 28(12):5923–5938, 2019.
- End-to-end semi-supervised learning for video action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14700–14710, June 2022.
- Blindly assess quality of in-the-wild videos via quality-aware pre-training and motion perception. IEEE Transactions on Circuits and Systems for Video Technology, 32(9):5944–5958, 2022.
- Quality assessment of in-the-wild videos. MM ’19, page 2351–2359, New York, NY, USA, 2019. Association for Computing Machinery.
- Unified quality assessment of in-the-wild videos with mixed datasets training. International Journal of Computer Vision, 129(4):1238–1257, Apr 2021.
- Exploring the effectiveness of video perceptual representation in blind video quality assessment. MM ’22, page 837–846, New York, NY, USA, 2022. Association for Computing Machinery.
- Video swin transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3202–3211, June 2022.
- Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- Conviqt: Contrastive video quality estimator, 2022.
- Multiview contrastive learning for completely blind video quality assessment of user generated content. In Proceedings of the 30th ACM International Conference on Multimedia, MM ’22, page 1914–1924, New York, NY, USA, 2022. Association for Computing Machinery.
- Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2013.
- Video quality assessment on mobile devices: Subjective, behavioral and objective studies. IEEE Journal of Selected Topics in Signal Processing, 6(6):652–671, 2012.
- Meta pseudo labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11557–11568, June 2021.
- Blind prediction of natural video quality. IEEE Transactions on Image Processing, 23(3):1352–1365, 2014.
- Study of subjective and objective quality assessment of video. IEEE Transactions on Image Processing, 19(6):1427–1441, 2010.
- An end-to-end no-reference video quality assessment method with hierarchical spatiotemporal feature representation. IEEE Transactions on Broadcasting, 68(3):651–660, 2022.
- Semi-supervised action recognition with temporal contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10389–10399, June 2021.
- Large-scale study of perceptual video quality. IEEE Transactions on Image Processing, 28(2):612–627, 2019.
- Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, page 596–608, Red Hook, NY, USA, 2020. Curran Associates Inc.
- Self-Supervised Video Representation Learning Using Inter-Intra Contrastive Framework, page 2193–2201. Association for Computing Machinery, New York, NY, USA, 2020.
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 1195–1204, 2017.
- Contrastive multiview coding. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, pages 776–794, Cham, 2020. Springer International Publishing.
- Ugc-vqa: Benchmarking blind video quality assessment for user generated content. IEEE Transactions on Image Processing, 30:4449–4464, 2021.
- RAPIQUE: rapid and accurate video quality prediction of user generated content. CoRR, abs/2101.10955, 2021.
- Review of objective video quality metrics and performance comparison using different databases. Signal Processing: Image Communication, 28(1):1 – 19, 2013.
- ViS3: an algorithm for video quality assessment via analysis of spatial and spatiotemporal slices. Journal of Electronic Imaging, 23(1):1 – 25, 2014.
- Youtube ugc dataset for video compression research. In 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pages 1–5, 2019.
- Semi-supervised deep ensembles for blind image quality assessment, 2021.
- Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision – ECCV 2022, pages 538–554, Cham, 2022. Springer Nature Switzerland.
- Towards explainable in-the-wild video quality assessment: a database and a language-prompted approach. arXiv preprint arXiv:2305.12726, 2023.
- No-reference video quality assessment via feature learning. In 2014 IEEE International Conference on Image Processing (ICIP), pages 491–495, 2014.
- Cross-model pseudo-labeling for semi-supervised action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2959–2968, June 2022.
- Patch-vq: ’patching up’ the video quality problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14019–14029, June 2021.
- Predicting the quality of images compressed after distortion in two steps. IEEE Transactions on Image Processing, 28(12):5757–5770, 2019.
- Semi-supervised authentically distorted image quality assessment with consistency-preserving dual-branch convolutional neural network. IEEE Transactions on Multimedia, pages 1–13, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.