Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FloLPIPS: A Bespoke Video Quality Metric for Frame Interpoation (2207.08119v2)

Published 17 Jul 2022 in eess.IV and cs.CV

Abstract: Video frame interpolation (VFI) serves as a useful tool for many video processing applications. Recently, it has also been applied in the video compression domain for enhancing both conventional video codecs and learning-based compression architectures. While there has been an increased focus on the development of enhanced frame interpolation algorithms in recent years, the perceptual quality assessment of interpolated content remains an open field of research. In this paper, we present a bespoke full reference video quality metric for VFI, FloLPIPS, that builds on the popular perceptual image quality metric, LPIPS, which captures the perceptual degradation in extracted image feature space. In order to enhance the performance of LPIPS for evaluating interpolated content, we re-designed its spatial feature aggregation step by using the temporal distortion (through comparing optical flows) to weight the feature difference maps. Evaluated on the BVI-VFI database, which contains 180 test sequences with various frame interpolation artefacts, FloLPIPS shows superior correlation performance (with statistical significance) with subjective ground truth over 12 popular quality assessors. To facilitate further research in VFI quality assessment, our code is publicly available at https://danier97.github.io/FloLPIPS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. H. Choi and I. V. Bajić, “Deep frame prediction for video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 7, pp. 1843–1855, 2020.
  2. M. Usman, X. He, K.-M. Lam, M. Xu, S. M. M. Bokhari, and J. Chen, “Frame interpolation for cloud-based mobile video streaming,” IEEE Transactions on Multimedia, vol. 18, no. 5, pp. 831–839, 2016.
  3. C.-Y. Wu, N. Singhal, and P. Krahenbuhl, “Video compression through image interpolation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 416–431.
  4. H. Lee, T. Kim, T.-y. Chung, D. Pak, Y. Ban, and S. Lee, “Adacof: Adaptive collaboration of flows for video frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5316–5325.
  5. D. Danier, F. Zhang, and D. Bull, “St-mfnet: A spatio-temporal multi-flow network for frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3521–3531.
  6. Z. Shi, X. Xu, X. Liu, J. Chen, and M.-H. Yang, “Video frame interpolation transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 482–17 491.
  7. L. Lu, R. Wu, H. Lin, J. Lu, and J. Jia, “Video frame interpolation with transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3532–3542.
  8. Z. Chen, R. Wang, H. Liu, and Y. Wang, “Pdwn: Pyramid deformable warping network for video interpolation,” IEEE Open Journal of Signal Processing, vol. 2, pp. 413–424, 2021.
  9. D. Danier, F. Zhang, and D. Bull, “Enhancing deformable convolution based video frame interpolation with coarse-to-fine 3d cnn,” arXiv preprint arXiv:2202.07731, 2022.
  10. L. Kong, B. Jiang, D. Luo, W. Chu, X. Huang, Y. Tai, C. Wang, and J. Yang, “Ifrnet: Intermediate feature refine network for efficient frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1969–1978.
  11. S. Niklaus and F. Liu, “Softmax splatting for video frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 5437–5446.
  12. P. Hu, S. Niklaus, S. Sclaroff, and K. Saenko, “Many-to-many splatting for efficient video frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3553–3562.
  13. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
  14. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
  15. D. Danier, F. Zhang, and D. Bull, “A subjective quality study for video frame interpolation,” arXiv preprint arXiv:2202.07727, 2022.
  16. Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, and M. Manohara, “Toward a practical perceptual video quality metric,” The Netflix Tech Blog, vol. 6, no. 2, 2016.
  17. H. R. Sheikh, A. C. Bovik, and G. De Veciana, “An information fidelity criterion for image quality assessment using natural scene statistics,” IEEE Transactions on image processing, vol. 14, no. 12, pp. 2117–2128, 2005.
  18. P. C. Madhusudana, N. Birkbeck, Y. Wang, B. Adsumilli, and A. C. Bovik, “St-greed: Space-time generalized entropic differences for frame rate dependent video quality prediction,” IEEE Transactions on Image Processing, vol. 30, pp. 7446–7457, 2021.
  19. F. Zhang, A. Mackin, and D. R. Bull, “A frame rate dependent video quality metric based on temporal wavelet decomposition and spatiotemporal pooling,” in 2017 IEEE International Conference on Image Processing (ICIP).   IEEE, 2017, pp. 300–304.
  20. Z. Liu, R. A. Yeh, X. Tang, Y. Liu, and A. Agarwala, “Video frame synthesis using deep voxel flow,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4463–4471.
  21. T. Kalluri, D. Pathak, M. Chandraker, and D. Tran, “Flavr: Flow-agnostic video representations for fast frame interpolation,” arXiv preprint arXiv:2012.08512, 2020.
  22. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015.
  23. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
  24. F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.
  25. D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8934–8943.
  26. X. Xu, L. Siyao, W. Sun, Q. Yin, and M.-H. Yang, “Quadratic video interpolation,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  27. Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2.   Ieee, 2003, pp. 1398–1402.
  28. L. Zhang, Y. Shen, and H. Li, “Vsi: A visual saliency-induced index for perceptual image quality assessment,” IEEE Transactions on Image processing, vol. 23, no. 10, pp. 4270–4281, 2014.
  29. P. C. Madhusudana, N. Birkbeck, Y. Wang, B. Adsumilli, and A. C. Bovik, “Image quality assessment using contrastive learning,” IEEE Transactions on Image Processing, vol. 31, pp. 4149–4161, 2022.
  30. R. Soundararajan and A. C. Bovik, “Video quality assessment by reduced reference spatio-temporal entropic differencing,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 4, pp. 684–694, 2012.
  31. M. Xu, J. Chen, H. Wang, S. Liu, G. Li, and Z. Bai, “C3dvqa: Full-reference video quality assessment with 3d convolutional neural network,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2020, pp. 4447–4451.
  32. VQEG, “Final report from the video quality experts group on the validation of objective quality metrics for video quality assessment,” 2000.
  33. K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack, “Study of subjective and objective quality assessment of video,” IEEE transactions on Image Processing, vol. 19, no. 6, pp. 1427–1441, 2010.
  34. T. Kroeger, R. Timofte, D. Dai, and L. V. Gool, “Fast optical flow using dense inverse search,” in European conference on computer vision.   Springer, 2016, pp. 471–488.
  35. H. Xu, J. Zhang, J. Cai, H. Rezatofighi, and D. Tao, “Gmflow: Learning optical flow via global matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8121–8130.
Citations (11)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com