NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results
Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024.
First 10 authors:
- Padding module: Learning the padding in deep neural networks. IEEE Access, 11:7348–7357, 2023.
- NTIRE 2024 dense and non-homogeneous dehazing challenge report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024.
- NTIRE 2024 challenge on night photography rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024.
- Deep portrait quality assessment. a NTIRE 2024 challenge survey. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024.
- Learning generalized spatial-temporal deep feature representation for no-reference video quality assessment. IEEE Trans. Circuits Syst. Video Technol., 32(4):1903–1916, 2022.
- NTIRE 2024 challenge on image super-resolution (×4): Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024.
- No-reference blur assessment of digital pictures based on multifeature classifiers. IEEE Transactions on image processing, 20(1):64–75, 2010.
- Deep raw image super-resolution. a NTIRE 2024 challenge survey. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024.
- Ugc-icvqa: Ugc video no-reference quality evaluation model based on complexity multi-task learning. 2024.
- Effects of padding on lstms and cnns. arXiv preprint arXiv:1903.07288, 2019.
- Perceptual quality assessment of smartphone photography. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3677–3686, 2020.
- Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6202–6211, 2019.
- Massive online crowdsourced study of subjective and objective picture quality. IEEE Transactions on Image Processing, 25(1):372–387, 2015.
- Mahdi Hashemi. Enlarging smaller images before inputting into convolutional neural network: zero-padding vs. interpolation. Journal of Big Data, 6(1):1–13, 2019.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- The konstanz natural video database (konvid-1k). In QoMEX, pages 1–6. IEEE, 2017.
- Koniq-10k: An ecologically valid database for deep learning of blind image quality assessment. IEEE Transactions on Image Processing, 29:4041–4056, 2020.
- Searching for mobilenetv3. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1314–1324, 2019.
- Squeezenet: Alexnet-level accuracy with 50x fewer parameters and <1mb model size. ArXiv, abs/1602.07360, 2016.
- Frank Klinker. Exponential moving average versus moving exponential average. Mathematische Semesterberichte, 58:97–107, 2011.
- Ifrnet: Intermediate feature refine network for efficient frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1969–1978, 2022.
- Stablevqa: A deep no-reference quality assessment model for video stability. In Proceedings of the 31st ACM International Conference on Multimedia, pages 1066–1076, 2023.
- Most apparent distortion: full-reference image quality assessment and the role of strategy. Journal of electronic imaging, 19(1):011006–011006, 2010.
- Blindly assess quality of in-the-wild videos via quality-aware pre-training and motion perception. IEEE Trans. Circuits Syst. Video Technol., 32(9):5944–5958, 2022.
- Quality assessment of in-the-wild videos. In ACM Multimedia, pages 2351–2359. ACM, 2019.
- Freqalign: Excavating perception-oriented transferability for blind image quality assessment from a frequency perspective. IEEE Transactions on Multimedia, 2023.
- NTIRE 2024 challenge on short-form UGC video quality assessment: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024.
- NTIRE 2024 restore any image model (RAIM) in the wild challenge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024.
- Kadid-10k: A large-scale artificially distorted iqa database. In 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), pages 1–3. IEEE, 2019.
- Partial convolution for padding, inpainting, and image synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):6096–6110, 2022a.
- Ada-dqa: Adaptive diverse quality-aware feature acquisition for video quality assessment. CoRR, abs/2308.00729, 2023a.
- Ada-dqa: Adaptive diverse quality-aware feature acquisition for video quality assessment. In ACM Multimedia, pages 6695–6704. ACM, 2023b.
- Source-free unsupervised domain adaptation for blind image quality assessment. arXiv preprint arXiv:2207.08124, 2022b.
- Swiniqa: Learned swin distance for compressed image quality assessment. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 1795–1799, 2022c.
- Liqa: Lifelong blind image quality assessment. IEEE Transactions on Multimedia, 2022d.
- NTIRE 2024 quality assessment of AI-generated content challenge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024a.
- NTIRE 2024 challenge on low light image enhancement: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024b.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
- Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12009–12019, 2022e.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11976–11986, 2022f.
- Video swin transformer. In CVPR, pages 3192–3201. IEEE, 2022g.
- Rtn: Reinforced transformer network for coronary ct angiography vessel-level image quality assessment. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 644–653. Springer, 2022a.
- Styleam: Perception-oriented unsupervised domain adaption for non-reference image quality assessment. arXiv preprint arXiv:2207.14489, 2022b.
- Aigc-vqa: A holistic perception metric for aigc video quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024a.
- Kvq: Kwai video quality assessment for short-form videos. CVPR, 2024b.
- Ava: A large-scale database for aesthetic visual analysis. In 2012 IEEE conference on computer vision and pattern recognition, pages 2408–2415. IEEE, 2012.
- Novel cnn with investigation on accuracy by modifying stride, padding, kernel size and filter numbers. Multimedia Tools and Applications, 82(15):23673–23691, 2023.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- Codalab competitions: An open source platform to organize scientific challenges. Journal of Machine Learning Research, 24(198):1–6, 2023.
- Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
- The ninth NTIRE 2024 efficient super-resolution challenge report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024.
- Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115:211 – 252, 2014.
- A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on image processing, 15(11):3440–3451, 2006.
- A deep learning based no-reference quality assessment model for UGC videos. In ACM Multimedia, pages 856–865. ACM, 2022.
- Analysis of video quality datasets via design of minimalistic video quality models. arXiv preprint arXiv:2307.13981, 2023.
- Enhancing blind video quality assessment with rich quality-aware features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.
- Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. Advances in neural information processing systems, 35:10078–10093, 2022.
- NTIRE 2024 image shadow removal challenge report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024.
- Icme 2021 ugc-vqa challenge. In Available: http://ugcvqa.com/.
- NTIRE 2024 challenge on stereo image super-resolution: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024a.
- Youtube UGC dataset for video compression research. In MMSP, pages 1–5. IEEE, 2019.
- NTIRE 2024 challenge on light field image super-resolution: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024b.
- Convnext v2: Co-designing and scaling convnets with masked autoencoders. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16133–16142, 2023.
- Visual transformers: Where do transformers really belong in vision models? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 599–609, 2021.
- Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling. In European conference on computer vision, pages 538–554. Springer, 2022a.
- Disentangling aesthetic and technical effects for video quality assessment of user generated content. CoRR, abs/2211.04894, 2022b.
- Discovqa: Temporal distortion-content transformers for video quality assessment. IEEE Trans. Circuits Syst. Video Technol., 33(9):4840–4854, 2023a.
- Towards explainable in-the-wild video quality assessment: A database and a language-prompted approach. In Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023, pages 1045–1054. ACM, 2023b.
- Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20144–20154, 2023c.
- Q-align: Teaching lmms for visual scoring via discrete text-defined levels. arXiv preprint arXiv:2312.17090, 2023d.
- Video quality assessment based on swin transformer with spatio-temporal feature fusion and data augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1846–1854, 2023e.
- Starvqa: Space-time attention for video quality assessment. In 2022 IEEE International Conference on Image Processing, ICIP 2022, Bordeaux, France, 16-19 October 2022, pages 2326–2330. IEEE, 2022.
- Short-form ugc video quality assessment based on multi-level video fusion with rank-aware. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.
- NTIRE 2024 challenge on blind enhancement of compressed image: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024.
- Patch-vq:’patching up’the video quality problem. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14019–14029, 2021.
- Sf-iqa: Quality and similarity integration for ai generated image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024a.
- Video quality assessment based on swin transformerv2 and coarse to fine strategy. arXiv preprint arXiv:2401.08522, 2024b.
- Capturing co-existing distortions in user-generated content for no-reference video quality assessment. In ACM Multimedia, pages 1098–1107. ACM, 2023.
- NTIRE 2024 challenge on HR depth from images of specular and transparent surfaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024.
- Sigmoid loss for language image pre-training. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11941–11952, 2023.
- Blind image quality assessment via vision-language correspondence: A multitask learning perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14071–14081, 2023.
- NTIRE 2024 challenge on bracketing image restoration and enhancement: Datasets, methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024.
- Quality-aware pretrained models for blind image quality assessment. In CVPR, pages 22302–22313. IEEE, 2023a.
- Zoom-vqa: Patches, frames and clips integration for video quality assessment. In CVPR Workshops, pages 1302–1310. IEEE, 2023b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.