Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models (2307.13981v2)

Published 26 Jul 2023 in cs.CV, cs.MM, and eess.IV

Abstract: Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. P. Juluri, V. Tamarapalli, and D. Medhi, “Measurement of quality of experience of video-on-demand services: A survey,” IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 401–418, 2015.
  2. K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack, “Study of subjective and objective quality assessment of video,” IEEE Transactions on Image Processing, vol. 19, no. 6, pp. 1427–1441, 2010.
  3. M. Nuutinen, T. Virtanen, M. Vaahteranoksa, T. Vuori, P. Oittinen, and J. Häkkinen, “CVD2014—A database for evaluating no-reference video quality assessment algorithms,” IEEE Transactions on Image Processing, vol. 25, no. 7, pp. 3073–3086, 2016.
  4. D. Ghadiyaram, J. Pan, A. C. Bovik, A. K. Moorthy, P. Panda, and K.-C. Yang, “In-capture mobile video distortions: A study of subjective behavior and objective algorithms,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 9, pp. 2061–2077, 2017.
  5. V. Hosu, F. Hahn, M. Jenadeleh, H. Lin, H. Men, T. Szirányi, S. Li, and D. Saupe, “The Konstanz natural video database (KoNViD-1k),” in IEEE International Conference on Quality of Multimedia Experience, 2017, pp. 1–6.
  6. Z. Sinno and A. C. Bovik, “Large-scale study of perceptual video quality,” IEEE Transactions on Image Processing, vol. 28, no. 2, pp. 612–627, 2018.
  7. Y. Wang, S. Inguva, and B. Adsumilli, “Youtube UGC dataset for video compression research,” in IEEE International Workshop on Multimedia Signal Processing, 2019, pp. 1–5.
  8. P. Chen, L. Li, Y. Huang, F. Tan, and W. Chen, “QoE evaluation for live broadcasting video,” in IEEE International Conference on Image Processing, 2019, pp. 454–458.
  9. Z. Ying, M. Mandal, D. Ghadiyaram, and A. C. Bovik, “Patch-VQ: ’Patching Up’ the video quality problem,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 019–14 029.
  10. X. Yu, Z. Tu, Z. Ying, A. C. Bovik, N. Birkbeck, Y. Wang, and B. Adsumilli, “Subjective quality assessment of user-generated content gaming videos,” in IEEE Winter Conference on Applications of Computer Vision, 2022, pp. 74–83.
  11. M. A. Saad, A. C. Bovik, and C. Charrier, “Blind prediction of natural video quality,” IEEE Transactions on Image Processing, vol. 23, no. 3, pp. 1352–1365, 2014.
  12. A. Mittal, M. A. Saad, and A. C. Bovik, “A completely blind video integrity oracle,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 289–300, 2015.
  13. W. Liu, Z. Duanmu, and Z. Wang, “End-to-end blind quality assessment of compressed videos using deep neural networks,” in ACM International Conference on Multimedia, 2018, pp. 546–554.
  14. J. Korhonen, “Two-level approach for no-reference consumer video quality assessment,” IEEE Transactions on Image Processing, vol. 28, no. 12, pp. 5923–5938, 2019.
  15. D. Li, T. Jiang, and M. Jiang, “Quality assessment of in-the-wild videos,” in ACM International Conference on Multimedia, 2019, pp. 2351–2359.
  16. J. Korhonen, Y. Su, and J. You, “Blind natural video quality prediction via statistical temporal features and deep spatial features,” in ACM International Conference on Multimedia, 2020, pp. 3311–3319.
  17. Z. Tu, Y. Wang, N. Birkbeck, B. Adsumilli, and A. C. Bovik, “UGC-VQA: Benchmarking blind video quality assessment for user generated content,” IEEE Transactions on Image Processing, vol. 30, pp. 4449–4464, 2021.
  18. X. Yu, N. Birkbeck, Y. Wang, C. G. Bampis, B. Adsumilli, and A. C. Bovik, “Predicting the quality of compressed videos with pre-existing distortions,” IEEE Transactions on Image Processing, vol. 30, pp. 7511–7526, 2021.
  19. F. Yi, M. Chen, W. Sun, X. Min, Y. Tian, and G. Zhai, “Attention based network for no-reference UGC video quality assessment,” in IEEE International Conference on Image Processing, 2021, pp. 1414–1418.
  20. Y. Wang, J. Ke, H. Talebi, J. G. Yim, N. Birkbeck, B. Adsumilli, P. Milanfar, and F. Yang, “Rich features for perceptual quality assessment of UGC videos,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 435–13 444.
  21. B. Li, W. Zhang, M. Tian, G. Zhai, and X. Wang, “Blindly assess quality of in-the-wild videos via quality-aware pre-training and motion perception,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 9, pp. 5944–5958, 2022.
  22. W. Sun, X. Min, W. Lu, and G. Zhai, “A deep learning based no-reference quality assessment model for UGC videos,” in ACM International Conference on Multimedia, 2022, pp. 856–865.
  23. H. Wu, C. Chen, J. Hou, L. Liao, A. Wang, W. Sun, Q. Yan, and W. Lin, “FAST-VQA: Efficient end-to-end video quality assessment with fragment sampling,” in European Conference on Computer Vision, 2022, pp. 538–554.
  24. H. Wu, E. Zhang, L. Liao, C. Chen, J. Hou, A. Wang, W. Sun, Q. Yan, and W. Lin, “Exploring video quality assessment on user generated contents from aesthetic and technical perspectives,” in IEEE International Conference on Computer Vision, 2023, pp. 20 144–20 154.
  25. Z. Li, Z. Duanmu, W. Liu, and Z. Wang, “AVC, HEVC, VP9, AVS2 or AV1?—A comparative study of state-of-the-art video encoders on 4K videos,” in International Conference on Image Analysis and Recognition, 2019, pp. 162–173.
  26. A. Mackin, F. Zhang, and D. R. Bull, “A study of subjective video quality at various frame rates,” in IEEE International Conference on Image Processing, 2015, pp. 3407–3411.
  27. D. Y. Lee, S. Paul, C. G. Bampis, H. Ko, J. Kim, S. Y. Jeong, B. Homan, and A. C. Bovik, “A subjective and objective study of space-time subsampled video quality,” IEEE Transactions on Image Processing, vol. 31, pp. 934–948, 2021.
  28. V. Vonikakis, R. Subramanian, and S. Winkler, “Shaping datasets: Optimal data selection for specific target distributions across dimensions,” in IEEE International Conference on Image Processing, 2016, pp. 3753–3757.
  29. ITU-T P.910, “Subjective video quality assessment methods for multimedia applications,” 2008. [Online]. Available: https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-P.910-200804-S!!PDF-E&type=items
  30. ITU-T P.913, “Methods for the subjective assessment of video quality audio quality and audiovisual quality of internet video and distribution quality television in any environment,” 2016. [Online]. Available: https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-P.913-201603-S!!PDF-E&type=items
  31. ITU-R BT.500-13, “Methodology for the subjective assessment of the quality of television pictures,” 2012. [Online]. Available: https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.500-13-201201-S!!PDF-E.pdf
  32. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
  33. A. K. Moorthy and A. C. Bovik, “Efficient video quality assessment along temporal trajectories,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 11, pp. 1653–1658, 2010.
  34. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.
  35. K. Ding, K. Ma, S. Wang, and E. P. Simoncelli, “Image quality assessment: Unifying structure and texture similarity,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2567–2581, 2020.
  36. Z. Li, C. Bampis, J. Novak, A. Aaron, K. Swanson, A. Moorthy, and J. Cock, “VMAF: The journey continues,” 2018. [Online]. Available: https://netflixtechblog.com/vmaf-the-journey-continues-44b51ee9ed12
  37. P. Cao, D. Li, and K. Ma, “Image quality assessment: Integrating model-centric and data-centric approaches,” in PMLR Conference on Parsimony and Learning, 2024, pp. 529–541.
  38. B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li, “YFCC100M: The new data in multimedia research,” Communications of the ACM, vol. 59, no. 2, pp. 64–73, 2016.
  39. A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a ’completely blind’ image quality analyzer,” IEEE Signal Processing Letters, vol. 20, no. 3, pp. 209–212, 2012.
  40. S. Winkler, “Analysis of public image and video databases for quality assessment,” IEEE Journal of Selected Topics in Signal Processing, vol. 6, no. 6, pp. 616–625, 2012.
  41. W. Robitza, R. R. R. Rao, S. Göring, and A. Raake, “Impact of spatial and temporal information on video quality and compressibility,” in IEEE International Conference on Quality of Multimedia Experience, 2021, pp. 65–68.
  42. Z. Ying, H. Niu, P. Gupta, D. Mahajan, D. Ghadiyaram, and A. C. Bovik, “From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 3575–3585.
  43. W. Zhang, K. Ma, G. Zhai, and X. Yang, “Uncertainty-aware blind image quality assessment in the laboratory and wild,” IEEE Transactions on Image Processing, vol. 30, pp. 3474–3486, 2021.
  44. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  45. M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” in International Conference on Machine Learning, 2019, pp. 6105–6114.
  46. J. Stroud, D. Ross, C. Sun, J. Deng, and R. Sukthankar, “D3D: Distilled 3D networks for video action recognition,” in IEEE Winter Conference on Applications of Computer Vision, 2020, pp. 625–634.
  47. Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A ConvNet for the 2020s,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 976–11 986.
  48. A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Transactions on Image Processing, vol. 21, no. 12, pp. 4695–4708, 2012.
  49. Z. Tu, C.-J. Chen, L.-H. Chen, N. Birkbeck, B. Adsumilli, and A. C. Bovik, “A comparative evaluation of temporal pooling methods for blind video quality assessment,” in IEEE International Conference on Image Processing, 2020, pp. 141–145.
  50. Z. Tu, X. Yu, Y. Wang, N. Birkbeck, B. Adsumilli, and A. C. Bovik, “RAPIQUE: Rapid and accurate video quality prediction of user generated content,” IEEE Open Journal of Signal Processing, vol. 2, pp. 425–440, 2021.
  51. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015, pp. 1–14.
  52. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li, “ImageNet: A large-scale hierarchical image database,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
  53. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021, pp. 1–22.
  54. C. Feichtenhofer, H. Fan, J. Malik, and K. He, “SlowFast networks for video recognition,” in IEEE International Conference on Computer Vision, 2019, pp. 6202–6211.
  55. Z. Liu, J. Ning, Y. Cao, Y. Wei, Z. Zhang, S. Lin, and H. Hu, “Video Swin Transformer,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 3202–3211.
  56. H. Ismail Fawaz, B. Lucas, G. Forestier, C. Pelletier, D. F. Schmidt, J. Weber, G. I. Webb, L. Idoumghar, P.-A. Muller, and F. Petitjean, “InceptionTime: Finding AlexNet for time series classification,” Data Mining and Knowledge Discovery, vol. 34, no. 6, pp. 1936–1962, 2020.
  57. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical vision Transformer using shifted windows,” in IEEE International Conference on Computer Vision, 2021, pp. 10 012–10 022.
  58. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 9729–9738.
  59. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 16 000–16 009.
  60. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning, 2021, pp. 8748–8763.
  61. J. Carreira and A. Zisserman, “Quo vadis, action recognition? A new model and the Kinetics dataset,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.
  62. L. L. Thurstone, “A law of comparative judgment,” Psychological Review, vol. 34, no. 4, pp. 273–286, 1927.
  63. R. A. Bradley and M. E. Terry, “Rank analysis of incomplete block designs: I. The method of paired comparisons,” Biometrika, vol. 39, no. 3/4, pp. 324–345, 1952.
  64. A. Ciancio, A. L. N. T. Targino da Costa, E. A. B. da Silva, A. Said, R. Samadani, and P. Obrador, “No-reference blur assessment of digital pictures based on multifeature classifiers,” IEEE Transactions on Image Processing, vol. 20, no. 1, pp. 64–75, 2010.
  65. D. Ghadiyaram and A. C. Bovik, “Massive online crowdsourced study of subjective and objective picture quality,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 372–387, 2015.
  66. V. Hosu, H. Lin, T. Sziranyi, and D. Saupe, “KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment,” IEEE Transactions on Image Processing, vol. 29, pp. 4041–4056, 2020.
  67. Y. Fang, H. Zhu, Y. Zeng, K. Ma, and Z. Wang, “Perceptual quality assessment of smartphone photography,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 3677–3686.
  68. VQEG, “Final report from the Video Quality Experts Group on the validation of objective models of video quality assessment,” 2000. [Online]. Available: https://www.its.bldrdoc.gov/media/8212/frtv_phase1_final_report.doc
  69. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, 2015, pp. 1–15.
  70. K. Zeng, T. Zhao, A. Rehman, and Z. Wang, “Characterizing perceptual artifacts in compressed video streams,” in SPIE, 2014, pp. 173–182.
  71. J. Wang, K. C. Chan, and C. C. Loy, “Exploring CLIP for assessing the look and feel of images,” in AAAI Conference on Artificial Intelligence, 2023, pp. 2555–2563.
  72. W. Zhang, G. Zhai, Y. Wei, X. Yang, and K. Ma, “Blind image quality assessment via vision-language correspondence: A multitask learning perspective,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 14 071–14 081.
  73. W. Sun, X. Min, D. Tu, S. Ma, and G. Zhai, “Blind quality assessment for in-the-wild images via hierarchical feature fusion and iterative mixed database training,” IEEE Journal of Selected Topics in Signal Processing, vol. 17, no. 6, pp. 1178–1192, 2023.
  74. H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation of recent full reference image quality assessment algorithms,” IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440–3451, 2006.
  75. E. C. Larson and D. M. Chandler, “Most apparent distortion: Full-reference image quality assessment and the role of strategy,” Journal of Electronic Imaging, vol. 19, no. 1, pp. 011 006:1–21, 2010.
  76. H. Lin, V. Hosu, and D. Saupe, “KADID-10k: A large-scale artificially distorted IQA database,” in IEEE International Conference on Quality of Multimedia Experience, 2019, pp. 1–3.
  77. K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder-decoder approaches,” in Workshop on Syntax, Semantics and Structure in Statistical Translation, 2014, pp. 103–111.
  78. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 6000–6010.
  79. J. Park, K. Seshadrinathan, S. Lee, and A. C. Bovik, “Video quality pooling adaptive to perceptual distortion severity,” IEEE Transactions on Image Processing, vol. 22, no. 2, pp. 610–620, 2012.
  80. A. Ninassi, O. Le Meur, P. Le Callet, and D. Barba, “Considering temporal variations of spatial visual distortions in video quality assessment,” IEEE Journal of Selected Topics in Signal Processing, vol. 3, no. 2, pp. 253–265, 2009.
  81. K. Seshadrinathan and A. C. Bovik, “Temporal hysteresis model of time varying subjective video quality,” in IEEE International Conference on Acoustics, Speech & Signal Processing, 2011, pp. 1153–1156.
  82. Z. Wang, H. Wang, T. Chen, Z. Wang, and K. Ma, “Troubleshooting blind image quality models in the wild,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 16 256–16 265.
  83. Z. Wang and K. Ma, “Active fine-tuning from gMAD examples improves blind image quality assessment,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4577–4590, 2021.
  84. K. Ma, Z. Duanmu, Z. Wang, Q. Wu, W. Liu, H. Yong, H. Li, and L. Zhang, “Group maximum differentiation competition: Model comparison with few samples,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 4, pp. 851–864, 2020.
  85. G. Davis, S. Mallat, and M. Avellaneda, “Adaptive greedy approximations,” Constructive Approximation, vol. 13, pp. 57–98, 1997.
Citations (15)

Summary

We haven't generated a summary for this paper yet.