Quality Assessment of In-the-Wild Videos (1908.00375v3)

Published 1 Aug 2019 in cs.MM, cs.CV, and eess.IV

Abstract: Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild videos. In this work, we show two eminent effects of the human visual system, namely, content-dependency and temporal-memory effects, could be used for this purpose. We propose an objective no-reference video quality assessment method by integrating both effects into a deep neural network. For content-dependency, we extract features from a pre-trained image classification neural network for its inherent content-aware property. For temporal-memory effects, long-term dependencies, especially the temporal hysteresis, are integrated into the network with a gated recurrent unit and a subjectively-inspired temporal pooling layer. To validate the performance of our method, experiments are conducted on three publicly available in-the-wild video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm, respectively. Experimental results demonstrate that our proposed method outperforms five state-of-the-art methods by a large margin, specifically, 12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE, respectively. Moreover, the ablation study verifies the crucial role of both the content-aware features and the modeling of temporal-memory effects. The PyTorch implementation of our method is released at https://github.com/lidq92/VSFA.

Authors (3)

Dingquan Li (18 papers)
Tingting Jiang (27 papers)
Ming Jiang (59 papers)

Citations (266)

View on Semantic Scholar

Summary

Objective No-Reference Video Quality Assessment for In-the-Wild Videos

The paper "Quality Assessment of In-the-Wild Videos" presents a sophisticated approach for evaluating the quality of videos captured in uncontrolled environments. The absence of reference videos and variations in shooting conditions, such as focus and exposure, render this task particularly challenging. Traditional video quality assessment (VQA) methods, primarily validated on synthetically-distorted video datasets, often fail when applied to "in-the-wild" contexts due to complex real-world distortions. This paper introduces a novel no-reference (NR) VQA method incorporating knowledge of the human visual system (HVS) into a deep neural network framework, leveraging content-dependency and temporal-memory effects to improve assessment accuracy.

Methodology

The authors propose a unique model combining content-aware features extracted from deep convolutional neural networks (CNNs) and temporal modeling through Gated Recurrent Units (GRUs). By utilizing a CNN pre-trained for image classification, the method extracts deep semantic features that are not only perceptual but also sensitive to content variations. To capture temporal dependencies inherent in video sequences, the network integrates these features using GRUs, a form of recurrent neural network adept at modeling long-term dependencies. Additionally, to encapsulate the temporal-memory effects observed in human perception—particularly the temporal hysteresis effect—a novel differentiable temporal pooling model inspired by subjective quality judgment is employed.

Experimental Results

The robustness of the proposed method is validated across three publicly available datasets—KoNViD-1k, CVD2014, and LIVE-Qualcomm—all containing "in-the-wild" videos captured under diverse conditions. The proposed model consistently outperforms existing state-of-the-art NR VQA methods, with improvements ranging from over 12% to 18% in terms of SROCC, KROCC, PLCC, and RMSE compared to the second-best method, VBLIINDS. Such performance underscores the importance of content-aware features and temporal-memory modeling in NR VQA tasks.

Implications and Future Directions

The findings highlight the potential of integrating HVS-inspired principles within machine learning frameworks to enhance video quality assessment tools. Primarily, the success of this model suggests that content-awareness and temporal-memory effects are crucial factors for evaluating videos captured outside controlled environments. The approach could facilitate more effective video quality control mechanisms in real-world applications, such as online video streaming and mobile video capture.

In future work, investigating the integration of spatio-temporal attention mechanisms might yield further enhancements by identifying significant temporal and spatial regions in video sequences that impact perceived quality. Additionally, exploring the incorporation of low-level motion information could address dynamic distortion challenges more comprehensively. Moreover, extending this methodology to encompass multi-modal data inputs or transferring it to other quality assessment domains, such as audio-visual content or augmented reality environments, present interesting avenues for research.

Overall, the paper provides a substantial contribution to the field of video quality assessment by offering a robust, validated methodology that aligns closely with human perceptual appraisal, thus setting a precedent for future advancements in NR VQA technologies.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - lidq92/VSFA: [official] Quality Assessment of In-the-Wild Videos (ACM MM 2019) (201 stars)