Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy (2407.20766v1)
Abstract: Deep Video Quality Assessment (VQA) methods have shown impressive high-performance capabilities. Notably, no-reference (NR) VQA methods play a vital role in situations where obtaining reference videos is restricted or not feasible. Nevertheless, as more streaming videos are being created in ultra-high definition (e.g., 4K) to enrich viewers' experiences, the current deep VQA methods face unacceptable computational costs. Furthermore, the resizing, cropping, and local sampling techniques employed in these methods can compromise the details and content of original 4K videos, thereby negatively impacting quality assessment. In this paper, we propose a highly efficient and novel NR 4K VQA technology. Specifically, first, a novel data sampling and training strategy is proposed to tackle the problem of excessive resolution. This strategy allows the VQA Swin Transformer-based model to effectively train and make inferences using the full data of 4K videos on standard consumer-grade GPUs without compromising content or details. Second, a weighting and scoring scheme is developed to mimic the human subjective perception mode, which is achieved by considering the distinct impact of each sub-region within a 4K frame on the overall perception. Third, we incorporate the frequency domain information of video frames to better capture the details that affect video quality, consequently further improving the model's generalizability. To our knowledge, this is the first technology for the NR 4K VQA task. Thorough empirical studies demonstrate it not only significantly outperforms existing methods on a specialized 4K VQA dataset but also achieves state-of-the-art performance across multiple open-source NR video quality datasets.
- AGH University of Science and Technology. n. d.. Video Quality Indicators. http://vq.kt.agh.edu.pl/metrics.html.
- Learning generalized spatial-temporal deep feature representation for no-reference video quality assessment. IEEE Transactions on Circuits and Systems for Video Technology 32, 4 (2021), 1903–1916.
- RIRNet: Recurrent-in-recurrent network for video quality assessment. In Proceedings of the ACM International Conference on Multimedia. 834–842.
- Manri Cheon and Jong-Seok Lee. 2017. Subjective and objective quality assessment of compressed 4K UHD videos for immersive experience. IEEE Transactions on Circuits and Systems for Video Technology 28, 7 (2017), 1467–1480.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
- Deepti Ghadiyaram and Alan C Bovik. 2017. Perceptual quality prediction on authentically distorted images using a bag of features approach. Journal of Vision 17, 1 (2017), 32–32.
- In-capture mobile video distortions: A study of subjective behavior and objective algorithms. IEEE Transactions on Circuits and Systems for Video Technology 28, 9 (2017), 2061–2077.
- No-reference image quality assessment with reinforcement recursive list-wise ranking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8336–8343.
- The Konstanz natural video database (KoNViD-1k). In International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1–6.
- P ITU-T RECOMMENDATION. 1999. Subjective video quality assessment methods for multimedia applications. (1999).
- Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5148–5157.
- Jari Korhonen. 2019. Two-level approach for no-reference consumer video quality assessment. IEEE Transactions on Image Processing 28, 12 (2019), 5923–5938.
- Blind natural video quality prediction via statistical temporal features and deep spatial features. In Proceedings of the ACM International Conference on Multimedia. 3311–3319.
- Blindly assess quality of in-the-wild videos via quality-aware pre-training and motion perception. IEEE Transactions on Circuits and Systems for Video Technology 32, 9 (2022), 5944–5958.
- Quality assessment of in-the-wild videos. In Proceedings of the ACM International Conference on Multimedia. 2351–2359.
- Unified quality assessment of in-the-wild videos with mixed datasets training. International Journal of Computer Vision 129 (2021), 1238–1257.
- Hybrid-MST: A hybrid active sampling strategy for pairwise preference aggregation. Advances in Neural Information Processing Systems 31 (2018).
- Exploring the effectiveness of video perceptual representation in blind video quality assessment. In Proceedings of the ACM International Conference on Multimedia. 837–846.
- Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment. In Proceedings of the ACM International Conference on Multimedia. 6695–6704.
- Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment. arXiv preprint arXiv:2401.02614 (2024).
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.
- Video swin transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3202–3211.
- Deep neural network for blind visual quality assessment of 4K content. IEEE Transactions on Broadcasting (2022).
- A study of subjective video quality at various spatial resolutions. In IEEE International Conference on Image Processing. IEEE, 2830–2834.
- ST-GREED: Space-time generalized entropic differences for frame rate dependent video quality prediction. IEEE Transactions on Image Processing 30 (2021), 7446–7457.
- A completely blind video integrity oracle. IEEE Transactions on Image Processing 25, 1 (2015), 289–300.
- Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters 20, 3 (2012), 209–212.
- Anush Krishna Moorthy and Alan Conrad Bovik. 2011. Blind image quality assessment: From natural scene statistics to perceptual quality. IEEE Transactions on Image Processing 20, 12 (2011), 3350–3364.
- CVD2014—A database for evaluating no-reference video quality assessment algorithms. IEEE Transactions on Image Processing 25, 7 (2016), 3073–3086.
- Suitable methodology in subjective video quality assessment: a resolution dependent paradigm. In International Workshop on Image Media Quality and its Applications. 6.
- Multi-model standard for bitstream-, pixel-based and hybrid video quality assessment of UHD/4K: ITU-T P. 1204. IEEE Access 8 (2020), 193020–193049.
- PNATS-UHD-1-Long: An Open Video Quality Dataset for Long Sequences for HTTP-based Adaptive Streaming QoE Assessment. In 2023 15th International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 252–257.
- AVT-VQDB-UHD-1-Appeal: A UHD-1/4K Open Dataset for Video Quality and Appeal Assessment Using Modern Video Codecs. In 2023 IEEE 25th International Workshop on Multimedia Signal Processing (MMSP). IEEE, 1–6.
- Avqbits—adaptive video quality model based on bitstream information for various video applications. IEEE Access 10 (2022), 80321–80351.
- AVT-VQDB-UHD-1: A large scale video quality database for UHD-1. In 2019 IEEE International Symposium on Multimedia (ISM). IEEE, 17–177.
- Blind image quality assessment: A natural scene statistics approach in the DCT domain. IEEE Transactions on Image Processing 21, 8 (2012), 3339–3352.
- Blind prediction of natural video quality. IEEE Transactions on Image Processing 23, 3 (2014), 1352–1365.
- B Series. 2012. Methodology for the subjective assessment of the quality of television pictures. Recommendation ITU-R BT 500, 13 (2012).
- Study of subjective and objective quality assessment of video. IEEE Transactions on Image Processing 19, 6 (2010), 1427–1441.
- Zeina Sinno and Alan Conrad Bovik. 2018. Large-scale study of perceptual video quality. IEEE Transactions on Image Processing 28, 2 (2018), 612–627.
- Rajiv Soundararajan and Alan C Bovik. 2012. Video quality assessment by reduced reference spatio-temporal entropic differencing. IEEE Transactions on Circuits and Systems for Video Technology 23, 4 (2012), 684–694.
- A deep learning based no-reference quality assessment model for ugc videos. In Proceedings of the ACM International Conference on Multimedia. 856–865.
- UGC-VQA: Benchmarking blind video quality assessment for user generated content. IEEE Transactions on Image Processing 30 (2021), 4449–4464.
- YouTube UGC dataset for video compression research. In IEEE International Workshop on Multimedia Signal Processing (MMSP). IEEE, 1–5.
- Rich features for perceptual quality assessment of UGC videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13435–13444.
- Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling. In Proceedings of the European Conference on Computer Vision. Springer, 538–554.
- Neighbourhood representative sampling for efficient end-to-end video quality assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
- Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 20144–20154.
- DVL2021: An ultra high definition video dataset for perceptual quality study. Journal of Visual Communication and Image Representation 82 (2022), 103374.
- Perceptual quality assessment of internet videos. In Proceedings of the ACM International Conference on Multimedia. 1248–1257.
- Patch-VQ:’Patching Up’the video quality problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14019–14029.
- Junyong You and Jari Korhonen. 2019. Deep neural networks for no-reference video quality assessment. In IEEE International Conference on Image Processing. IEEE, 2349–2353.
- Perceptual quality assessment for recognizing true and pseudo 4K content. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2190–2194.