Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ODVista: An Omnidirectional Video Dataset for super-resolution and Quality Enhancement Tasks (2403.00604v2)

Published 1 Mar 2024 in eess.IV

Abstract: Omnidirectional or 360-degree video is being increasingly deployed, largely due to the latest advancements in immersive virtual reality (VR) and extended reality (XR) technology. However, the adoption of these videos in streaming encounters challenges related to bandwidth and latency, particularly in mobility conditions such as with unmanned aerial vehicles (UAVs). Adaptive resolution and compression aim to preserve quality while maintaining low latency under these constraints, yet downscaling and encoding can still degrade quality and introduce artifacts. Machine learning (ML)-based super-resolution (SR) and quality enhancement techniques offer a promising solution by enhancing detail recovery and reducing compression artifacts. However, current publicly available 360-degree video SR datasets lack compression artifacts, which limit research in this field. To bridge this gap, this paper introduces omnidirectional video streaming dataset (ODVista), which comprises 200 high-resolution and high quality videos downscaled and encoded at four bitrate ranges using the high-efficiency video coding (HEVC)/H.265 standard. Evaluations show that the dataset not only features a wide variety of scenes but also spans different levels of content complexity, which is crucial for robust solutions that perform well in real-world scenarios and generalize across diverse visual environments. Additionally, we evaluate the performance, considering both quality enhancement and runtime, of two handcrafted and two ML-based SR models on the validation and testing sets of ODVista.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. “Learning a deep convolutional network for image super-resolution,” in Computer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, Eds., Cham, 2014, pp. 184–199, Springer International Publishing.
  2. “Swinir: Image restoration using swin transformer,” in 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021, pp. 1833–1844.
  3. “Activating more pixels in image super-resolution transformer,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 22367–22377.
  4. “Photo-realistic single image super-resolution using a generative adversarial network,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 105–114.
  5. “Frame attention recurrent back-projection network for accurate video super-resolution,” in 2022 IEEE International Conference on Consumer Electronics (ICCE), 2022, pp. 01–05.
  6. “Ntire 2023 challenge on 360deg omnidirectional image and video super-resolution: Datasets, methods and results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1731–1745.
  7. “YouTube,” https://www.youtube.com/, 2024-01-30.
  8. “A subjective visual quality assessment method of panoramic videos,” in 2017 IEEE International Conference on Multimedia and Expo (ICME), 2017, pp. 517–522.
  9. “Bridge the gap between vqa and human behavior on omnidirectional video: A large-scale dataset and a deep learning model,” in Proceedings of the 26th ACM International Conference on Multimedia, New York, NY, USA, 2018, MM ’18, pp. 932–940, ACM.
  10. “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, 2012.
  11. “Intelligent scene cut detection and video splitting tool,” https://pyscenedetect.readthedocs.io/en/latest/, 2024-01-30.
  12. Stefan Winkler, “Analysis of Public Image and Video Databases for Quality Assessment,” IEEE Journal of Selected Topics in Signal Processing, vol. 6, no. 6, pp. 616–625, 2012.
  13. “VCA: Video Complexity Analyzer,” in Proceedings of the 13th ACM Multimedia Systems Conference, 2022, pp. 259–264.
  14. Claude E Duchon, “Lanczos Filtering in One and Two Dimensions,” Journal of Applied Meteorology and Climatology, vol. 18, no. 8, pp. 1016–1022, 1979.
  15. “FFmpeg,” https://www.ffmpeg.org/, 2024-01-30.
  16. “Nvidia video codec sdk,” https://developer.nvidia.com/video-codec-sdk, 2024-01-30.
  17. “Review on determining number of cluster in k-means clustering,” International Journal, vol. 1, no. 6, pp. 90–95, 2013.
  18. Robert Keys, “Cubic convolution interpolation for digital image processing,” IEEE transactions on acoustics, speech, and signal processing, vol. 29, no. 6, pp. 1153–1160, 1981.
  19. “openCV,” https://opencv.org/, 2024-01-30.
  20. “Accelerating the super-resolution convolutional neural network,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 391–407.
  21. “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2015.
  22. “Swinir: Image restoration using swin transformer,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 1833–1844.
  23. “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022.

Summary

We haven't generated a summary for this paper yet.