A Comparative Evaluation of Temporal Pooling Methods for Blind Video Quality Assessment (2002.10651v1)

Published 25 Feb 2020 in cs.MM and cs.CV

Abstract: Many objective video quality assessment (VQA) algorithms include a key step of temporal pooling of frame-level quality scores. However, less attention has been paid to studying the relative efficiencies of different pooling methods on no-reference (blind) VQA. Here we conduct a large-scale comparative evaluation to assess the capabilities and limitations of multiple temporal pooling strategies on blind VQA of user-generated videos. The study yields insights and general guidance regarding the application and selection of temporal pooling models. In addition, we also propose an ensemble pooling model built on top of high-performing temporal pooling models. Our experimental results demonstrate the relative efficacies of the evaluated temporal pooling models, using several popular VQA algorithms, and evaluated on two recent large-scale natural video quality databases. In addition to the new ensemble model, we provide a general recipe for applying temporal pooling of frame-based quality predictions.

Citations (57)

View on Semantic Scholar

Summary

The paper "A Comparative Evaluation of Temporal Pooling Methods for Blind Video Quality Assessment" examines various temporal pooling strategies within the framework of no-reference (NR) video quality assessment (VQA), a significant challenge given the diverse and complex distortions present in user-generated content (UGC). This paper is pivotal in optimizing and understanding the integration of temporal pooling algorithms into NR VQA models, which quantify the perceived quality of video content without the need for pristine references.

Key Findings

In examining temporal pooling methods, the paper evaluates several pooling strategies, including arithmetic, harmonic, and geometric means, as well as more nuanced approaches like VQPooling and Hysteresis, to average frame-level quality scores. Notably, the proposed ensemble-based pooling approach (EPooling) amalgamates predictions from multiple models into a cohesive quality metric, resulting in performance consistency across varied content types.

Numerical Results: The paper evaluates these pooling techniques against the KoNViD-1k and LIVE-VQC datasets, utilizing Spearman rank-order correlation coefficient (SRCC) and Pearson linear correlation coefficient (PLCC) as metrics. Notably, while simpler pooling methods like arithmetic mean provide baseline performance on KoNViD-1k, methods emphasizing low-quality segments such as Hysteresis yield significant performance gains (up to 0.03 in SRCC and PLCC) on the LIVE-VQC dataset which contains more dynamic content. EPooling, the ensemble approach, consistently produces competitive results across both datasets.

Theoretical and Practical Implications

From a theoretical standpoint, this paper elucidates the importance of considering temporal dynamics within video sequences, emphasizing that perception is often influenced significantly by low-quality frames or sections. The research bolsters the understanding that context and content variability (e.g., presence of motion or temporal changes) dictate the optimal choice of temporal pooling method, thus reinforcing the need for adaptive and content-aware VQA strategies.

Practically, the findings offer guidance to streaming services and social media platforms in optimizing their video processing algorithms. For instance, selecting or designing temporal pooling strategies based on content characteristics can enhance user experience by more accurately predicting perceived video quality and managing content adaptation decisions in real time.

Future Directions

The results presented open avenues for further investigation into generalized pooling strategies that are not only adaptive to content characteristics but also scalable across different domains of video content, including live streams and long-format videos. Furthermore, incorporating advanced deep learning techniques in conjunction with traditional pooling approaches may offer enhancements in robustness and predictability in real-world applications.

This paper sets the stage for continued exploration into adaptive pooling mechanisms, especially as video content grows in diversity and complexity. As NR VQA models evolve, the insights drawn from temporal pooling will remain integral in fine-tuning these models to meet industry and consumer demands for high-quality video experiences.