Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting (2303.08331v2)

Published 15 Mar 2023 in cs.CV, cs.LG, cs.NE, and eess.IV

Abstract: As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, thus achieving better video quality and transmission efficiency. However, a large number of chunks are expected to ensure good overfitting quality, which substantially increases the storage and consumes more bandwidth resources for data transmission. On the other hand, decreasing the number of chunks through training optimization techniques usually requires high model capacity, which significantly slows down execution speed. To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum. Additionally, we advance our method into a single overfitting model by a data-aware joint training technique, which further reduces the storage requirement with negligible quality drop. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality. Compared with the state-of-the-art, our method achieves 28 fps streaming speed with 41.6 PSNR, which is 14$\times$ faster and 2.29 dB better in the live video resolution upscaling tasks. Code available in https://github.com/coulsonlee/STDO-CVPR2023.git

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Scaling learning algorithms towards ai. Large-scale kernel machines, 34(5):1–41, 2007.
  2. Real-time video super-resolution with spatio-temporal networks and motion compensation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4778–4787, 2017.
  3. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12299–12310, 2021.
  4. Sr360: boosting 360-degree video streaming with super-resolution. In Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video, pages 1–6, 2020.
  5. Deepcoder: A deep neural network based video compression. In 2017 IEEE Visual Communications and Image Processing (VCIP), pages 1–4. IEEE, 2017.
  6. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 764–773, 2017.
  7. Second-order attention network for single image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11065–11074, 2019.
  8. Streaming 360-degree videos using super-resolution. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pages 1977–1986. IEEE, 2020.
  9. Fractional-grey wolf optimizer-based kernel weighted regression model for multi-view face video super resolution. International Journal of Machine Learning and Cybernetics, 10(5):859–877, 2019.
  10. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015.
  11. Accelerating the super-resolution convolutional neural network. In European conference on computer vision, pages 391–407. Springer, 2016.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  13. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 2758–2766, 2015.
  14. Learning what data to learn. arXiv preprint arXiv:1702.08635, 2017.
  15. Iterative reconstruction of multivariate band-limited functions from irregular sampling values. SIAM journal on mathematical analysis, 23(1):244–261, 1992.
  16. Video compression with rate-distortion autoencoders. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7033–7042, 2019.
  17. Deep probabilistic video compression. 2018.
  18. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  19. Efficient video compression via content-adaptive super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4521–4530, 2021.
  20. Neural-enhanced live streaming: Improving live video ingest via online learning. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication, pages 107–125, 2020.
  21. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016.
  22. Spatio-temporal transformer network for video restoration. In Proceedings of the European Conference on Computer Vision (ECCV), pages 106–122, 2018.
  23. Self-paced learning for latent variable models. Advances in neural information processing systems, 23, 2010.
  24. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017.
  25. Mobisr: Efficient on-device super-resolution through heterogeneous mobile processors. In The 25th annual international conference on mobile computing and networking, pages 1–16, 2019.
  26. Efficient meta-tuning for content-aware neural video delivery. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVIII, pages 308–324. Springer, 2022.
  27. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1833–1844, 2021.
  28. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 136–144, 2017.
  29. Video super-resolution based on deep learning: a comprehensive survey. Artificial Intelligence Review, pages 1–55, 2022.
  30. Overfitting the data: Compact neural video delivery via content-aware feature modulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4631–4640, 2021.
  31. Dvc: An end-to-end deep video compression framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11006–11015, 2019.
  32. Image super-resolution with non-local sparse attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3517–3526, 2021.
  33. Ntire 2019 challenge on video deblurring and super-resolution: Dataset and study. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
  34. Single image super-resolution via a holistic attention network. In European conference on computer vision, pages 191–207. Springer, 2020.
  35. Saint: spatially aware interpolation network for medical slice synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7750–7759, 2020.
  36. Fcanet: Frequency channel attention networks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 783–792, 2021.
  37. Dynamicvit: Efficient vision transformers with dynamic token sparsification. Advances in neural information processing systems, 34:13937–13949, 2021.
  38. Learned video compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3454–3463, 2019.
  39. Frame-recurrent video super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6626–6634, 2018.
  40. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1874–1883, 2016.
  41. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  42. Detail-revealing deep video super-resolution. In Proceedings of the IEEE International Conference on Computer Vision, pages 4472–4480, 2017.
  43. Tdan: Temporally-deformable alignment network for video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3360–3369, 2020.
  44. Ntire 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 114–125, 2017.
  45. Reconstruction of a high resolution image from registration and restoration of low resolution images. In Proceedings of 1st international conference on image processing, volume 3, pages 553–557. IEEE, 1994.
  46. An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159, 2018.
  47. Image super-resolution using dense skip connections. In Proceedings of the IEEE international conference on computer vision, pages 4799–4807, 2017.
  48. Deformable non-local network for video super-resolution. IEEE Access, 7:177734–177744, 2019.
  49. Edvr: Video restoration with enhanced deformable convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
  50. Video compression through image interpolation. In Proceedings of the European conference on computer vision (ECCV), pages 416–431, 2018.
  51. Sensor-augmented neural adaptive bitrate video streaming on uavs. IEEE Transactions on Multimedia, 22(6):1567–1576, 2019.
  52. Video enhancement with task-oriented flow. International Journal of Computer Vision, 127(8):1106–1125, 2019.
  53. Nemo: enabling neural-enhanced video streaming on commodity mobile devices. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1–14, 2020.
  54. How will deep learning change internet video delivery? In Proceedings of the 16th ACM Workshop on Hot Topics in Networks, pages 57–64, 2017.
  55. Neural adaptive content-aware internet video delivery. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 645–661, 2018.
  56. Deformable 3d convolution for video super-resolution. IEEE Signal Processing Letters, 27:1500–1504, 2020.
  57. Wide activation for efficient and accurate image super-resolution. arXiv preprint arXiv:1808.08718, 2018.
  58. Mest: Accurate and fast memory-economic sparse training framework on the edge. Advances in Neural Information Processing Systems, 34:20838–20850, 2021.
  59. Achieving on-mobile real-time super-resolution with neural architecture and pruning search. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4821–4831, 2021.
  60. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV), pages 286–301, 2018.
  61. Context reasoning attention network for image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4278–4287, 2021.
Citations (10)

Summary

We haven't generated a summary for this paper yet.