Space-Time Video Super-resolution with Neural Operator (2404.06036v1)
Abstract: This paper addresses the task of space-time video super-resolution (ST-VSR). Existing methods generally suffer from inaccurate motion estimation and motion compensation (MEMC) problems for large motions. Inspired by recent progress in physics-informed neural networks, we model the challenges of MEMC in ST-VSR as a mapping between two continuous function spaces. Specifically, our approach transforms independent low-resolution representations in the coarse-grained continuous function space into refined representations with enriched spatiotemporal details in the fine-grained continuous function space. To achieve efficient and accurate MEMC, we design a Galerkin-type attention function to perform frame alignment and temporal interpolation. Due to the linear complexity of the Galerkin-type attention mechanism, our model avoids patch partitioning and offers global receptive fields, enabling precise estimation of large motions. The experimental results show that the proposed method surpasses state-of-the-art techniques in both fixed-size and continuous space-time video super-resolution tasks.
- J. Flynn, I. Neulander, J. Philbin, and N. Snavely, “Deepstereo: Learning to predict new views from the world’s imagery,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5515–5524.
- S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, and O. Wang, “Deep video deblurring for hand-held cameras,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 237–246.
- X. Xiang, Y. Tian, Y. Zhang, Y. Fu, J. P. Allebach, and C. Xu, “Zooming slow-mo: Fast and accurate one-stage space-time video super-resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 3370–3379.
- G. Xu, J. Xu, Z. Li, L. Wang, X. Sun, and M. Cheng, “Temporal modulation network for controllable space-time video super-resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 6388–6397.
- S. Kim, S. Hong, M. Joh, and S. K. Song, “Deeprain: ConvLSTM Network for precipitation prediction using multichannel radar data,” arXiv preprint arXiv:1711.02316, 2021.
- Z. Shi, X. Liu, C. Li, L. Dai, J. Chen, T. N. Davidson, and J. Zhao, “Learning for unconstrained space-time video super-resolution,” IEEE Transactions on Broadcasting, pp. 345–358, 2021.
- Z. Chen, Y. Chen, J. Liu, X. Xu, V. Goel, Z. Wang, H. Shi, and X. Wang, “VideoINR: Learning video implicit neural representation for continuous space-time super-resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 2047–2057.
- Y.-H. Chen, S.-C. Chen, Y.-Y. Lin, and W.-H. Peng, “MoTIF: Learning motion trajectories with local implicit neural functions for continuous space-time video super-resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 23 131–23 141.
- M. Haris, G. Shakhnarovich, and N. Ukita, “Space-time-aware multi-resolution video enhancement,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2859–2868.
- Y. Zhang, H. Wang, H. Zhu, and Z. Chen, “Optical flow reusing for high-efficiency space-time video super resolution,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 5, pp. 2116–2128, 2023.
- Y. Zhang, H. Wang, and Z. Chen, “Controllable space-time video super-resolution via enhanced bidirectional flow warping,” in IEEE International Conference on Visual Communications and Image Processing, 2022.
- Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anandkumar, “Neural operator: Graph kernel network for partial differential equations,” arXiv preprint arxiv:2003.03485, 2020.
- J. Guibas, M. Mardani, Z. Li, A. Tao, A. Anandkumar, and B. Catanzaro, “Adaptive fourier neural operators: Efficient token mixers for transformers,” arXiv preprint arxiv:2111.13587, 2021.
- G. Wen, Z. Li, K. Azizzadenesheli, A. Anandkumar, and S. M. Benson, “U-FNO - an enhanced fourier neural operator based-deep learning model for multiphase flow,” arXiv preprint arxiv: 2109.03697, 2021.
- A. H. d. O. Fonseca, E. Zappala, J. O. Caro, and D. van Dijk, “Continuous spatiotemporal transformers,” arXiv preprint arXiv:2301.13338, 2023.
- Z. Li, N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anandkumar, “Fourier neural operator for parametric partial differential equations,” in International Conference on Learning Representations, 2021.
- A. Tran, A. P. Mathews, L. Xie, and C. S. Ong, “Factorized fourier neural operators,” in International Conference on Learning Representations, 2023.
- W. Xiong, X. Huang, Z. Zhang, R. Deng, P. Sun, and Y. Tian, “Koopman neural operator as a mesh-free solver of non-linear partial differential equations,” arXiv preprint arXiv:2301.10022, 2023.
- J. Liang, J. Cao, G. Sun, K. Zhang, L. V. Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” in IEEE International Conference on Computer Vision, 2021, pp. 1833–1844.
- J. Cao, Q. Wang, Y. Xian, Y. Li, B. Ni, Z. Pi, K. Zhang, Y. Zhang, R. Timofte, and L. Van Gool, “CiaoSR: Continuous implicit attention-in-attention network for arbitrary-scale image super-resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 1796–1807.
- W. Bao, W.-S. Lai, C. Ma, X. Zhang, Z. Gao, and M.-H. Yang, “Depth-aware video frame interpolation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3703–3712.
- Z. Huang, T. Zhang, W. Heng, B. Shi, and S. Zhou, “Real-time intermediate flow estimation for video frame interpolation,” European Conference on Computer Vision, pp. 624–642, 2022.
- L. Kong, B. Jiang, D. Luo, W. Chu, X. Huang, Y. Tai, C. Wang, and J. Yang, “IFRnet: Intermediate feature refine network for efficient frame interpolation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 1969–1978.
- W. Shen, W. Bao, G. Zhai, L. Chen, X. Min, and Z. Gao, “Video frame interpolation and enhancement via pyramid recurrent framework,” IEEE Trans. Image Process., vol. 30, pp. 277–292, 2021.
- D. Li, Y. Liu, and Z. Wang, “Video super-resolution using non-simultaneous fully recurrent convolutional network,” IEEE Trans. Image Process., vol. 28, no. 3, pp. 1342–1355, 2019.
- P. Yi, Z. Wang, K. Jiang, Z. Shao, and J. Ma, “Multi-temporal ultra dense memory network for video super-resolution,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 8, pp. 2503–2516, 2020.
- K. C. Chan, X. Wang, K. Yu, C. Dong, and C. C. Loy, “BasicVSR: The search for essential components in video super-resolution and beyond,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 4947–4956.
- W. Wen, W. Ren, Y. Shi, Y. Nie, J. Zhang, and X. Cao, “Video super-resolution via a spatio-temporal alignment network,” IEEE Trans. Image Process., vol. 31, pp. 1761–1773, 2022.
- M. Liu, S. Jin, C. Yao, C. Lin, and Y. Zhao, “Temporal consistency learning of inter-frames for video super-resolution,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 4, pp. 1507–1520, 2023.
- J. Dong, K. Ota, and M. Dong, “Video frame interpolation: A comprehensive survey,” ACM Trans. Multim. Comput. Commun. Appl., vol. 19, no. 2s, pp. 1–31, 2023.
- H. Liu, Z. Ruan, P. Zhao, C. Dong, F. Shang, Y. Liu, L. Yang, and R. Timofte, “Video super-resolution based on deep learning: a comprehensive survey,” Artificial Intelligence Review, vol. 55, no. 8, pp. 5981–6035, 2022.
- S. Cao, “Choose a transformer: Fourier or galerkin,” in Advances in Neural Information Processing Systems, 2021, pp. 24 924–24 940.
- M. Wei and X. Zhang, “Super-resolution neural operator,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 247–18 256.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.
- P. Ren, C. Rao, Y. Liu, J.-X. Wang, and H. Sun, “PhyCRNet: Physics-informed convolutional-recurrent network for solving spatiotemporal pdes,” Computer Methods in Applied Mechanics and Engineering, vol. 389, p. 114399, 2022.
- L. Lu, R. Wu, H. Lin, J. Lu, and J. Jia, “Video frame interpolation with transformer,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 3532–3542.
- G. Zhang, Y. Zhu, H. Wang, Y. Chen, G. Wu, and L. Wang, “Extracting motion and appearance via inter-frame attention for efficient video frame interpolation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 5682–5692.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
- X. Chen, X. Wang, J. Zhou, Y. Qiao, and C. Dong, “Activating more pixels in image super-resolution transformer,” in IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 367–22 377.
- S. Shi, J. Gu, L. Xie, X. Wang, Y. Yang, and C. Dong, “Rethinking alignment in video super-resolution transformers,” in Advances in Neural Information Processing Systems, 2022, pp. 36 081–36 093.
- Y. Chen, S. Liu, and X. Wang, “Learning continuous image representation with local implicit image function,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 8628–8638.
- H. Jiang, D. Sun, V. Jampani, M. H. Yang, and J. Kautz, “Super slomo: High quality estimation of multiple intermediate frames for video interpolation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9000–9008.
- X. Wang, K. Chan, K. Yu, C. Dong, and C. C. Loy, “EDVR: Video restoration with enhanced deformable convolutional networks,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019.
- X. Cheng and Z. Chen, “Multiple video frame interpolation via enhanced deformable separable convolution,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 7029–7045, 2022.
- K. C. Chan, S. Zhou, X. Xu, and C. C. Loy, “BasicVSR++: Improving video super-resolution with enhanced propagation and alignment,” in IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 5972–5981.
- T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision, vol. 127, no. 8, pp. 1106–1125, 2019.
- C. Liu and D. Sun, “A bayesian approach to adaptive video super resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 209–216.
- S. Nah, T. H. Kim, and K. M. Lee, “Deep multi-scale convolutional neural network for dynamic scene deblurring,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 257–265.
- X. Tao, H. Gao, R. Liao, J. Wang, and J. Jia, “Detail-revealing deep video super-resolution,” in IEEE International Conference on Computer Vision, 2017, pp. 4472–4480.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep laplacian pyramid networks for fast and accurate super-resolution,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 624–632.
- S. Jiang, D. Campbell, Y. Lu, H. Li, and R. I. Hartley, “Learning to estimate hidden motions with global motion aggregation,” in IEEE International Conference on Computer Vision, 2021, pp. 9752–9761.
- P. Fischer, A. Dosovitskiy, E. Ilg, P. Husser, C. Hazrba, V. Golkov, V. Patrick, D. Cremers, and T. Brox, “Flownet: Learning optical flow with convolutional networks,” in IEEE International Conference on Computer Vision, 2016, pp. 2758–2766.
- Z. Teed and J. Deng, “RAFT: recurrent all-pairs field transforms for optical flow,” in European Conference on Computer Vision, 2020, pp. 402–419.
- D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Models matter, so does training: An empirical study of cnns for optical flow estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 6, pp. 1408–1423, 2018.
- S. Niklaus and F. Liu, “Softmax splatting for video frame interpolation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 5437–5446.
- A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4161–4170.
- X. Li, J. Dong, J. Tang, and J. Pan, “Dlgsanet: lightweight dynamic local and global self-attention networks for image super-resolution,” in IEEE International Conference on Computer Vision, 2023, pp. 12 792–12 801.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.