Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 77 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 189 tok/s Pro
GPT OSS 120B 431 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Space-Time Video Super-resolution with Neural Operator (2404.06036v1)

Published 9 Apr 2024 in cs.CV

Abstract: This paper addresses the task of space-time video super-resolution (ST-VSR). Existing methods generally suffer from inaccurate motion estimation and motion compensation (MEMC) problems for large motions. Inspired by recent progress in physics-informed neural networks, we model the challenges of MEMC in ST-VSR as a mapping between two continuous function spaces. Specifically, our approach transforms independent low-resolution representations in the coarse-grained continuous function space into refined representations with enriched spatiotemporal details in the fine-grained continuous function space. To achieve efficient and accurate MEMC, we design a Galerkin-type attention function to perform frame alignment and temporal interpolation. Due to the linear complexity of the Galerkin-type attention mechanism, our model avoids patch partitioning and offers global receptive fields, enabling precise estimation of large motions. The experimental results show that the proposed method surpasses state-of-the-art techniques in both fixed-size and continuous space-time video super-resolution tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. J. Flynn, I. Neulander, J. Philbin, and N. Snavely, “Deepstereo: Learning to predict new views from the world’s imagery,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5515–5524.
  2. S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, and O. Wang, “Deep video deblurring for hand-held cameras,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 237–246.
  3. X. Xiang, Y. Tian, Y. Zhang, Y. Fu, J. P. Allebach, and C. Xu, “Zooming slow-mo: Fast and accurate one-stage space-time video super-resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 3370–3379.
  4. G. Xu, J. Xu, Z. Li, L. Wang, X. Sun, and M. Cheng, “Temporal modulation network for controllable space-time video super-resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 6388–6397.
  5. S. Kim, S. Hong, M. Joh, and S. K. Song, “Deeprain: ConvLSTM Network for precipitation prediction using multichannel radar data,” arXiv preprint arXiv:1711.02316, 2021.
  6. Z. Shi, X. Liu, C. Li, L. Dai, J. Chen, T. N. Davidson, and J. Zhao, “Learning for unconstrained space-time video super-resolution,” IEEE Transactions on Broadcasting, pp. 345–358, 2021.
  7. Z. Chen, Y. Chen, J. Liu, X. Xu, V. Goel, Z. Wang, H. Shi, and X. Wang, “VideoINR: Learning video implicit neural representation for continuous space-time super-resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 2047–2057.
  8. Y.-H. Chen, S.-C. Chen, Y.-Y. Lin, and W.-H. Peng, “MoTIF: Learning motion trajectories with local implicit neural functions for continuous space-time video super-resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 23 131–23 141.
  9. M. Haris, G. Shakhnarovich, and N. Ukita, “Space-time-aware multi-resolution video enhancement,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2859–2868.
  10. Y. Zhang, H. Wang, H. Zhu, and Z. Chen, “Optical flow reusing for high-efficiency space-time video super resolution,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 5, pp. 2116–2128, 2023.
  11. Y. Zhang, H. Wang, and Z. Chen, “Controllable space-time video super-resolution via enhanced bidirectional flow warping,” in IEEE International Conference on Visual Communications and Image Processing, 2022.
  12. Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anandkumar, “Neural operator: Graph kernel network for partial differential equations,” arXiv preprint arxiv:2003.03485, 2020.
  13. J. Guibas, M. Mardani, Z. Li, A. Tao, A. Anandkumar, and B. Catanzaro, “Adaptive fourier neural operators: Efficient token mixers for transformers,” arXiv preprint arxiv:2111.13587, 2021.
  14. G. Wen, Z. Li, K. Azizzadenesheli, A. Anandkumar, and S. M. Benson, “U-FNO - an enhanced fourier neural operator based-deep learning model for multiphase flow,” arXiv preprint arxiv: 2109.03697, 2021.
  15. A. H. d. O. Fonseca, E. Zappala, J. O. Caro, and D. van Dijk, “Continuous spatiotemporal transformers,” arXiv preprint arXiv:2301.13338, 2023.
  16. Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anandkumar, “Fourier neural operator for parametric partial differential equations,” in International Conference on Learning Representations, 2021.
  17. A. Tran, A. P. Mathews, L. Xie, and C. S. Ong, “Factorized fourier neural operators,” in International Conference on Learning Representations, 2023.
  18. W. Xiong, X. Huang, Z. Zhang, R. Deng, P. Sun, and Y. Tian, “Koopman neural operator as a mesh-free solver of non-linear partial differential equations,” arXiv preprint arXiv:2301.10022, 2023.
  19. J. Liang, J. Cao, G. Sun, K. Zhang, L. V. Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” in IEEE International Conference on Computer Vision, 2021, pp. 1833–1844.
  20. J. Cao, Q. Wang, Y. Xian, Y. Li, B. Ni, Z. Pi, K. Zhang, Y. Zhang, R. Timofte, and L. Van Gool, “CiaoSR: Continuous implicit attention-in-attention network for arbitrary-scale image super-resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 1796–1807.
  21. W. Bao, W.-S. Lai, C. Ma, X. Zhang, Z. Gao, and M.-H. Yang, “Depth-aware video frame interpolation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3703–3712.
  22. Z. Huang, T. Zhang, W. Heng, B. Shi, and S. Zhou, “Real-time intermediate flow estimation for video frame interpolation,” European Conference on Computer Vision, pp. 624–642, 2022.
  23. L. Kong, B. Jiang, D. Luo, W. Chu, X. Huang, Y. Tai, C. Wang, and J. Yang, “IFRnet: Intermediate feature refine network for efficient frame interpolation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 1969–1978.
  24. W. Shen, W. Bao, G. Zhai, L. Chen, X. Min, and Z. Gao, “Video frame interpolation and enhancement via pyramid recurrent framework,” IEEE Trans. Image Process., vol. 30, pp. 277–292, 2021.
  25. D. Li, Y. Liu, and Z. Wang, “Video super-resolution using non-simultaneous fully recurrent convolutional network,” IEEE Trans. Image Process., vol. 28, no. 3, pp. 1342–1355, 2019.
  26. P. Yi, Z. Wang, K. Jiang, Z. Shao, and J. Ma, “Multi-temporal ultra dense memory network for video super-resolution,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 8, pp. 2503–2516, 2020.
  27. K. C. Chan, X. Wang, K. Yu, C. Dong, and C. C. Loy, “BasicVSR: The search for essential components in video super-resolution and beyond,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 4947–4956.
  28. W. Wen, W. Ren, Y. Shi, Y. Nie, J. Zhang, and X. Cao, “Video super-resolution via a spatio-temporal alignment network,” IEEE Trans. Image Process., vol. 31, pp. 1761–1773, 2022.
  29. M. Liu, S. Jin, C. Yao, C. Lin, and Y. Zhao, “Temporal consistency learning of inter-frames for video super-resolution,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 4, pp. 1507–1520, 2023.
  30. J. Dong, K. Ota, and M. Dong, “Video frame interpolation: A comprehensive survey,” ACM Trans. Multim. Comput. Commun. Appl., vol. 19, no. 2s, pp. 1–31, 2023.
  31. H. Liu, Z. Ruan, P. Zhao, C. Dong, F. Shang, Y. Liu, L. Yang, and R. Timofte, “Video super-resolution based on deep learning: a comprehensive survey,” Artificial Intelligence Review, vol. 55, no. 8, pp. 5981–6035, 2022.
  32. S. Cao, “Choose a transformer: Fourier or galerkin,” in Advances in Neural Information Processing Systems, 2021, pp. 24 924–24 940.
  33. M. Wei and X. Zhang, “Super-resolution neural operator,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 247–18 256.
  34. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
  35. P. Ren, C. Rao, Y. Liu, J.-X. Wang, and H. Sun, “PhyCRNet: Physics-informed convolutional-recurrent network for solving spatiotemporal pdes,” Computer Methods in Applied Mechanics and Engineering, vol. 389, p. 114399, 2022.
  36. L. Lu, R. Wu, H. Lin, J. Lu, and J. Jia, “Video frame interpolation with transformer,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 3532–3542.
  37. G. Zhang, Y. Zhu, H. Wang, Y. Chen, G. Wu, and L. Wang, “Extracting motion and appearance via inter-frame attention for efficient video frame interpolation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 5682–5692.
  38. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
  39. X. Chen, X. Wang, J. Zhou, Y. Qiao, and C. Dong, “Activating more pixels in image super-resolution transformer,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 367–22 377.
  40. S. Shi, J. Gu, L. Xie, X. Wang, Y. Yang, and C. Dong, “Rethinking alignment in video super-resolution transformers,” in Advances in Neural Information Processing Systems, 2022, pp. 36 081–36 093.
  41. Y. Chen, S. Liu, and X. Wang, “Learning continuous image representation with local implicit image function,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 8628–8638.
  42. H. Jiang, D. Sun, V. Jampani, M. H. Yang, and J. Kautz, “Super slomo: High quality estimation of multiple intermediate frames for video interpolation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9000–9008.
  43. X. Wang, K. Chan, K. Yu, C. Dong, and C. C. Loy, “EDVR: Video restoration with enhanced deformable convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019.
  44. X. Cheng and Z. Chen, “Multiple video frame interpolation via enhanced deformable separable convolution,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 7029–7045, 2022.
  45. K. C. Chan, S. Zhou, X. Xu, and C. C. Loy, “BasicVSR++: Improving video super-resolution with enhanced propagation and alignment,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 5972–5981.
  46. T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision, vol. 127, no. 8, pp. 1106–1125, 2019.
  47. C. Liu and D. Sun, “A bayesian approach to adaptive video super resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 209–216.
  48. S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 257–265.
  49. X. Tao, H. Gao, R. Liao, J. Wang, and J. Jia, “Detail-revealing deep video super-resolution,” in IEEE International Conference on Computer Vision, 2017, pp. 4472–4480.
  50. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  51. W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep laplacian pyramid networks for fast and accurate super-resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 624–632.
  52. S. Jiang, D. Campbell, Y. Lu, H. Li, and R. I. Hartley, “Learning to estimate hidden motions with global motion aggregation,” in IEEE International Conference on Computer Vision, 2021, pp. 9752–9761.
  53. P. Fischer, A. Dosovitskiy, E. Ilg, P. Husser, C. Hazrba, V. Golkov, V. Patrick, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in IEEE International Conference on Computer Vision, 2016, pp. 2758–2766.
  54. Z. Teed and J. Deng, “RAFT: recurrent all-pairs field transforms for optical flow,” in European Conference on Computer Vision, 2020, pp. 402–419.
  55. D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Models matter, so does training: An empirical study of cnns for optical flow estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 6, pp. 1408–1423, 2018.
  56. S. Niklaus and F. Liu, “Softmax splatting for video frame interpolation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 5437–5446.
  57. A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4161–4170.
  58. X. Li, J. Dong, J. Tang, and J. Pan, “Dlgsanet: lightweight dynamic local and global self-attention networks for image super-resolution,” in IEEE International Conference on Computer Vision, 2023, pp. 12 792–12 801.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.