- The paper introduces a unified framework for video quality assessment by leveraging mixed datasets training and aligning dataset-specific perceptual scales.
- It employs a three-stage process—relative quality assessment, nonlinear mapping with a 4-parameter logistic function, and perceptual scale alignment—to mirror human visual perception.
- Experimental results on key datasets show significant improvements in SROCC and PLCC compared to state-of-the-art models, highlighting enhanced prediction accuracy.
Unified Quality Assessment of In-the-Wild Videos with Mixed Datasets Training: An Overview
This paper presents a novel approach for video quality assessment (VQA) in the wild, utilizing a unified framework that leverages mixed datasets training to tackle the diverse challenges associated with assessing video quality in variable conditions. The authors address a significant gap in the domain of computer vision—how to effectively evaluate video quality when traditional reference videos are unavailable, distortion types are complex, and video content is vastly diverse.
The proposed method is innovative in that it incorporates principles from human perception, specifically focusing on content dependency and the temporal-memory effects of the human visual system. This is crucial as videos captured in real-world (or 'in-the-wild') conditions often display a wide array of unpredictable distortions like motion blur, exposure issues, and noise.
Framework Overview
The authors introduce a unified VQA framework comprising three distinct stages: relative quality assessment, nonlinear mapping, and dataset-specific perceptual scale alignment. This framework supports the mixed datasets training strategy to enhance model robustness across different datasets.
- Relative Quality Assessment: This stage predicts the relative quality, focusing on ranking videos in terms of perceived quality. This is important as it aligns with how humans tend to compare visual quality.
- Nonlinear Mapping: To address the nonlinearity of human perception, this stage employs a 4-parameter logistic function to map relative quality to perceptual quality. The mapping adjusts for the non-linear response of viewers to varying quality levels, which is a common phenomenon in perceptual evaluations.
- Dataset-Specific Perceptual Scale Alignment: Given that subjective quality scores are not uniform across datasets, this stage aligns the predicted perceptual quality with subjective scores specific to each dataset. This is a crucial step to ensure that the model's output is comparable across different datasets with diverse score ranges.
Experimental Results
The proposed model was tested against four prominent datasets: LIVE-VQC, LIVE-Qualcomm, KoNViD-1k, and CVD2014. The outcomes demonstrate the model's superior performance over existing state-of-the-art models, primarily in cross-dataset evaluation scenarios where prior methods have lacked robustness. Specifically, the model shows a marked improvement in Spearman's rank-order correlation coefficient (SROCC) and Pearson's linear correlation coefficient (PLCC), highlighting its effectiveness in maintaining prediction monotonicity and accuracy.
Implications and Future Directions
Practically, this research lays the groundwork for more reliable quality assessment in applications such as video streaming, surveillance, and content creation where video input conditions are not controlled. Theoretically, it showcases an approach to integrate perceptual attributes into machine learning models more deeply.
Looking forward, the research opens several avenues for development. Future work could explore integrating additional perceptual phenomena and enhancing model efficiency through lightweight network architectures. Moreover, there is potential for applying this unified framework to other domains in computer vision and beyond, where diverse datasets need a coherent evaluative approach.
The authors have provided a PyTorch implementation of their method for reproducible research, underscoring their commitment to advancing the field through open collaboration.