Learned Scanpaths Aid Blind Panoramic Video Quality Assessment (2404.00252v2)
Abstract: Panoramic videos have the advantage of providing an immersive and interactive viewing experience. Nevertheless, their spherical nature gives rise to various and uncertain user viewing behaviors, which poses significant challenges for panoramic video quality assessment (PVQA). In this work, we propose an end-to-end optimized, blind PVQA method with explicit modeling of user viewing patterns through visual scanpaths. Our method consists of two modules: a scanpath generator and a quality assessor. The scanpath generator is initially trained to predict future scanpaths by minimizing their expected code length and then jointly optimized with the quality assessor for quality prediction. Our blind PVQA method enables direct quality assessment of panoramic images by treating them as videos composed of identical frames. Experiments on three public panoramic image and video quality datasets, encompassing both synthetic and authentic distortions, validate the superiority of our blind PVQA model over existing methods.
- PID control system analysis, design, and technology. IEEE Transactions on Control Systems Technology, 13(4):559–576, 2005.
- Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
- Transformer-based long-term viewport prediction in 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT video: Scanpath is all you need. In IEEE International Workshop on Multimedia Signal Processing, pages 1–6, 2021.
- Spherical structural similarity index for objective omnidirectional video quality assessment. In IEEE International Conference on Multimedia and Expo, pages 1–6, 2018.
- Spherical CNNs. In International Conference on Learning Representations, pages 1–15, 2018.
- Perceptual quality assessment of omnidirectional images. In IEEE International Symposium on Circuits and Systems, pages 1–5, 2018.
- Adaptive hypergraph convolutional network for no-reference 360-degree image quality assessment. In ACM International Conference on Multimedia, pages 961–969, 2022.
- Emil J. Gumbel. Statistical Theory of Extreme Values and Some Practical Applications. US Government Printing Office, 1948.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
- Masked autoencoders are scalable vision learners. In IEEE Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
- Spatial transformer networks. In Advances in Neural Information Processing Systems, pages 2017–2025, 2015.
- Categorical reparameterization with Gumbel-Softmax. In International Conference on Learning Representations, pages 1–12, 2017.
- Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
- Auto-encoding variational bayes. In International Conference on Learning Representations, pages 1–14, 2014.
- Viewport proposal CNN for 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT video quality assessment. In IEEE Conference on Computer Vision and Pattern Recognition, pages 10169–10178, 2019a.
- Very long term field of view prediction for 360-degree video streaming. In IEEE Conference on Multimedia Information Processing and Retrieval, pages 297–302, 2019b.
- Scanpath prediction in panoramic videos via expected code length minimization. arXiv preprint arXiv:2305.02536, 2023.
- Swin Transformer: Hierarchical vision Transformer using shifted windows. In IEEE International Conference on Computer Vision, pages 10012–10022, 2021.
- Video Swin Transformer. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3202–3211, 2022.
- Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2012.
- Data compression and transmission aspects of panoramic videos. IEEE Transactions on Circuits and Systems for Video Technology, 15(1):82–95, 2005.
- Your attention is unique: Detecting 360-degree video saliency in head-mounted display for head movement prediction. In ACM International Conference on Multimedia, pages 1190–1198, 2018.
- Expanding language-image pretrained models for general video recognition. In European Conference on Computer Vision, pages 1–18, 2022.
- Scanpaths in saccadic eye movements while viewing and recognizing patterns. Vision Research, 11(9):929–942, 1971a.
- Scanpaths in eye movements during pattern perception. Science, 171(3968):308–311, 1971b.
- TRACK: A new method from a re-examination of deep architectures for head motion prediction in 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5681–5699, 2022.
- Stephen P. Smith. Differentiation of the Cholesky algorithm. Journal of Computational and Graphical Statistics, 4(2):134–147, 1995.
- Perceptual quality assessment of omnidirectional images as moving camera videos. IEEE Transactions on Visualization and Computer Graphics, 28(8):3022–3034, 2021.
- ScanDMM: A deep markov model of scanpath prediction for 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT images. In IEEE Conference on Computer Vision and Pattern Recognition, pages 6989–6999, 2023a.
- Perceptual quality assessment of 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT images based on generative scanpath representation. arXiv preprint arXiv:2309.03472, 2023b.
- A large-scale compressed 360-degree spherical image database: From subjective quality evaluation to objective model comparison. In IEEE International Workshop on Multimedia Signal Processing, pages 1–6, 2018.
- MC360IQA: A multi-channel CNN for blind 360-degree image quality assessment. IEEE Journal of Selected Topics in Signal Processing, 14(1):64–77, 2020.
- Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE Signal Processing Letters, 24(9):1408–1412, 2017.
- Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000–6010, 2017.
- Non-local neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, pages 7794–7803, 2018.
- Perceptual quality assessment of virtual reality videos in the wild. IEEE Transactions on Circuits and Systems for Video Technology, to appear, 2024.
- Assessor360: Multi-sequence network for blind omnidirectional image quality assessment. In Advances in Neural Information Processing Systems, pages 1–22, 2023.
- Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks. IEEE Transactions on Circuits and Systems for Video Technology, 31(5):1724–1737, 2020.
- Gaze prediction in dynamic 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT immersive videos. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5333–5342, 2018.
- Panoramic video quality assessment based on non-local spherical CNN. IEEE Transactions on Multimedia, 23:797–809, 2021.
- A framework to evaluate omnidirectional video coding schemes. In IEEE International Symposium on Mixed and Augmented Reality, pages 31–36, 2015.
- Quality metric for spherical panoramic video. In SPIE Optics and Photonics for Information Processing X, pages 57–65, 2016.
- Optimum settings for automatic controllers. Transactions of the American Society of Mechanical Engineers, 64(8):759–765, 1942.