Light-VQA+: A Video Quality Assessment Model for Exposure Correction with Vision-Language Guidance (2405.03333v2)
Abstract: Recently, User-Generated Content (UGC) videos have gained popularity in our daily lives. However, UGC videos often suffer from poor exposure due to the limitations of photographic equipment and techniques. Therefore, Video Exposure Correction (VEC) algorithms have been proposed, Low-Light Video Enhancement (LLVE) and Over-Exposed Video Recovery (OEVR) included. Equally important to the VEC is the Video Quality Assessment (VQA). Unfortunately, almost all existing VQA models are built generally, measuring the quality of a video from a comprehensive perspective. As a result, Light-VQA, trained on LLVE-QA, is proposed for assessing LLVE. We extend the work of Light-VQA by expanding the LLVE-QA dataset into Video Exposure Correction Quality Assessment (VEC-QA) dataset with over-exposed videos and their corresponding corrected versions. In addition, we propose Light-VQA+, a VQA model specialized in assessing VEC. Light-VQA+ differs from Light-VQA mainly from the usage of the CLIP model and the vision-language guidance during the feature extraction, followed by a new module referring to the Human Visual System (HVS) for more accurate assessment. Extensive experimental results show that our model achieves the best performance against the current State-Of-The-Art (SOTA) VQA models on the VEC-QA dataset and other public datasets.
- Learning multi-scale photo exposure correction. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9153–9163, 2021.
- Spatiotemporal feature integration and model fusion for full reference video quality assessment. IEEE Transactions on Circuits and Systems for Video Technology, 29(8):2256–2270, 2019.
- Language models are few-shot learners, 2020.
- Chapter 2 - the human visual system. In David R. Bull and Fan Zhang, editors, Intelligent Image and Video Compression (Second Edition), pages 17–58. Academic Press, Oxford, second edition edition, 2021.
- ByteDance. Capcut. https://www.capcut.cn/, 2017.
- Contrast enhancement of brightness-distorted images by improved adaptive gamma correction. Computers & Electrical Engineering, 66:569–582, 2018.
- Seeing motion in the dark. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
- Light-vqa: A multi-dimensional quality assessment model for low-light video enhancement, 2023.
- Exposure correction model to enhance image quality, 2022.
- Slowfast networks for video recognition, 2019.
- Vdpve: Vqa dataset for perceptual video enhancement, 2023.
- In-capture mobile video distortions: A study of subjective behavior and objective algorithms. IEEE Transactions on Circuits and Systems for Video Technology, 28(9):2061–2077, 2018.
- Cross-attention is all you need: Adapting pretrained transformers for machine translation, 2021.
- Minimum mean brightness error contrast enhancement of color images using adaptive gamma correction with color preserving framework. Optik, 127(4):1671–1676, 2016.
- Measuring colourfulness in natural images. 2003.
- The konstanz natural video database (konvid-1k). In 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6, 2017.
- Lora: Low-rank adaptation of large language models, 2021.
- Deep fourier-based exposure correction network with spatial-frequency interaction. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision – ECCV 2022, pages 163–180, Cham, 2022. Springer Nature Switzerland.
- Jari Korhonen. Two-level approach for no-reference consumer video quality assessment. IEEE Transactions on Image Processing, 28(12):5923–5938, 2019.
- Stablevqa: A deep no-reference quality assessment model for video stability. In Proceedings of the 31st ACM International Conference on Multimedia, 2023.
- Subjective-aligned dataset and metric for text-to-video quality assessment. arXiv preprint arXiv:2403.11956, 2024.
- Blindly assess quality of in-the-wild videos via quality-aware pre-training and motion perception. IEEE Transactions on Circuits and Systems for Video Technology, 32(9):5944–5958, 2022.
- Q-refine: A perceptual quality refiner for ai-generated image. arXiv preprint arXiv:2401.01117, 2024.
- Agiqa-3k: An open database for ai-generated image quality assessment. IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- Quality assessment of in-the-wild videos. In Proceedings of the 27th ACM International Conference on Multimedia, MM ’19. ACM, October 2019.
- Fastllve: Real-time low-light video enhancement with intensity-aware lookup table. In The 31st ACM International Conference on Multimedia, 2023.
- Ugc-video: perceptual quality assessment of user-generated videos, 2019.
- Improved baselines with visual instruction tuning, 2023.
- Llava-next: Improved reasoning, ocr, and world knowledge, January 2024.
- Visual instruction tuning, 2023.
- Robust multi-frame super-resolution based on spatially weighted half-quadratic estimation and adaptive btv regularization. IEEE Transactions on Image Processing, 2018.
- End-to-end trainable video super-resolution based on a new mechanism for implicit motion estimation and compensation. In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020.
- Exploit camera raw data for video super-resolution via hidden markov model inference. IEEE Transactions on Image Processing, 2021.
- Swin transformer: Hierarchical vision transformer using shifted windows, 2021.
- Fully convolutional networks for semantic segmentation, 2015.
- Mbllen: Low-light image/video enhancement using cnns. In BMVC, volume 220, page 4. Northumbria University, 2018.
- Psenet: Progressive self-enhancement network for unsupervised extreme-light image enhancement, 2022.
- Learning exposure correction via consistency modeling. In British Machine Vision Conference, 2021.
- Adaptive contrast enhancement methods with brightness preserving. IEEE Transactions on Consumer Electronics, 56(4):2543–2551, 2010.
- Learning transferable visual models from natural language supervision, 2021.
- Language models are unsupervised multitask learners. 2019.
- High-resolution image synthesis with latent diffusion models, 2021.
- Blind prediction of natural video quality. IEEE Transactions on Image Processing, 23(3):1352–1365, 2014.
- Learning for unconstrained space-time video super-resolution. IEEE Transactions on Broadcasting, 2021.
- Video frame interpolation via generalized deformable convolution. IEEE Transactions on Multimedia, 2021.
- Video frame interpolation transformer. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Large-scale study of perceptual video quality. IEEE Transactions on Image Processing, 28(2):612–627, 2019.
- Simran Somal. Image enhancement using local and global histogram equalization technique and their comparison. In Ashish Kumar Luhach, Janos Arpad Kosa, Ramesh Chandra Poonia, Xiao-Zhi Gao, and Dharm Singh, editors, First International Conference on Sustainable Technologies for Computational Intelligence, pages 739–753, Singapore, 2020. Springer Singapore.
- Video quality assessment by reduced reference spatio-temporal entropic differencing. IEEE Transactions on Circuits and Systems for Video Technology, 23(4):684–694, 2013.
- A deep learning based no-reference quality assessment model for ugc videos. In Proceedings of the 30th ACM International Conference on Multimedia, page 856–865, 2022.
- Yfcc100m: The new data in multimedia research. Communications of the ACM, 59(2):64–73, 2016.
- Mlp-mixer: An all-mlp architecture for vision, 2021.
- Ugc-vqa: Benchmarking blind video quality assessment for user generated content. IEEE Transactions on Image Processing, 30:4449–4464, 2021.
- Rapique: Rapid and accurate video quality prediction of user generated content. IEEE Open Journal of Signal Processing, 2:425–440, 2021.
- Brightness preserving histogram equalization with maximum entropy: a variational perspective. IEEE Transactions on Consumer Electronics, 51(4):1326–1334, 2005.
- Local color distributions prior for image enhancement. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision – ECCV 2022, pages 343–359, Cham, 2022. Springer Nature Switzerland.
- Exploring clip for assessing the look and feel of images. In AAAI, 2023.
- Seeing dynamic scene in the dark: A high-quality video dataset with mechatronic alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9700–9709, October 2021.
- Youtube ugc dataset for video compression research. In 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), pages 1–5, 2019.
- Rich features for perceptual quality assessment of ugc videos. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13430–13439, 2021.
- A strong baseline for image and video quality assessment, 2021.
- Accflow: Backward accumulation for long-range optical flow. In International Conference on Computer Vision, 2023.
- Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling, 2022.
- Towards explainable video quality assessment: A database and a language-prompted approach. In ACM MM, 2023.
- Exploring video quality assessment on user generated contents from aesthetic and technical perspectives. In International Conference on Computer Vision (ICCV), 2023.
- Q-align: Teaching lmms for visual scoring via discrete text-defined levels, 2023.
- Towards open-ended visual quality comparison. arXiv preprint arXiv:2402.16641, 2024.
- Online video streaming super-resolution with adaptive look-up table fusion. arXiv preprint arXiv:2303.00334, 2023.
- Patch-vq: ‘patching up’ the video quality problem. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14014–14024, 2021.
- Predicting the quality of compressed videos with pre-existing distortions. IEEE Transactions on Image Processing, 30:7511–7526, 2021.
- Learning temporal consistency for low light video enhancement from single images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4967–4976, June 2021.
- Dual illumination estimation for robust exposure correction, 2019.
- Blind image quality assessment via vision-language correspondence: A multitask learning perspective, 2023.
- Deep color consistent network for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1899–1908, June 2022.
- A perceptual quality assessment exploration for aigc images. In IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2023.
- Advancing zero-shot digital human quality assessment through text-prompted evaluation. arXiv preprint arXiv:2307.02808, 2023.
- Q-boost: On visual quality assessment ability of low-level multi-modality foundation models. arXiv preprint arXiv:2312.15300, 2023.
- A reduced-reference quality assessment metric for textured mesh digital humans. In International Conference on Acoustics, Speech, and Signal Processing, 2024.
- Semantic-guided zero-shot learning for low-light image/video enhancement. In Proceedings of the IEEE/CVF Winter conference on applications of computer vision, pages 581–590, 2022.
- Object detection in 20 years: A survey, 2023.