Dynamic Frame Interpolation in Wavelet Domain (2309.03508v2)
Abstract: Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience. Existing methods have achieved great success by employing advanced motion models and synthesis networks. However, the spatial redundancy when synthesizing the target frame has not been fully explored, that can result in lots of inefficient computation. On the other hand, the computation compression degree in frame interpolation is highly dependent on both texture distribution and scene motion, which demands to understand the spatial-temporal information of each input frame pair for a better compression degree selection. In this work, we propose a novel two-stage frame interpolation framework termed WaveletVFI to address above problems. It first estimates intermediate optical flow with a lightweight motion perception network, and then a wavelet synthesis network uses flow aligned context features to predict multi-scale wavelet coefficients with sparse convolution for efficient target frame reconstruction, where the sparse valid masks that control computation in each scale are determined by a crucial threshold ratio. Instead of setting a fixed value like previous methods, we find that embedding a classifier in the motion perception network to learn a dynamic threshold for each sample can achieve more computation reduction with almost no loss of accuracy. On the common high resolution and animation frame interpolation benchmarks, proposed WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts. Code is available at https://github.com/ltkong218/WaveletVFI.
- H. Jiang, D. Sun, V. Jampani, M.-H. Yang, E. Learned-Miller, and J. Kautz, “Super slomo: High quality estimation of multiple intermediate frames for video interpolation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
- C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-quality video view interpolation using a layered representation,” in ACM SIGGRAPH 2004 Papers, 2004.
- L. Siyao, S. Zhao, W. Yu, W. Sun, D. Metaxas, C. C. Loy, and Z. Liu, “Deep animation video interpolation in the wild,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Y. Dar and A. M. Bruckstein, “Motion-compensated coding and frame rate up-conversion: Models and analysis,” IEEE Transactions on Image Processing, 2015.
- S. Dikbas and Y. Altunbasak, “Novel true-motion estimation algorithm and its application to motion-compensated temporal frame interpolation,” IEEE Transactions on Image Processing, 2013.
- S. Niklaus and F. Liu, “Context-aware synthesis for video frame interpolation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- W. Bao, W.-S. Lai, C. Ma, X. Zhang, Z. Gao, and M.-H. Yang, “Depth-aware video frame interpolation,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- S. Niklaus and F. Liu, “Softmax splatting for video frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- X. Xu, L. Siyao, W. Sun, Q. Yin, and M.-H. Yang, “Quadratic video interpolation,” in Advances in Neural Information Processing Systems, 2019.
- Y. Zhang, C. Wang, and D. Tao, “Video frame interpolation without temporal priors,” in Advances in Neural Information Processing Systems, 2020.
- Z. Chi, R. Mohammadi Nasiri, Z. Liu, J. Lu, J. Tang, and K. N. Plataniotis, “All at once: Temporally adaptive multi-frame interpolation with advanced motion modeling,” in Computer Vision – ECCV 2020, 2020.
- H. Sim, J. Oh, and M. Kim, “Xvfi: extreme video frame interpolation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision (IJCV), 2019.
- H. Zhang, Y. Zhao, and R. Wang, “A flexible recurrent residual pyramid network for video frame interpolation,” in Computer Vision – ECCV 2020, 2020.
- Z. Huang, T. Zhang, W. Heng, B. Shi, and S. Zhou, “Real-time intermediate flow estimation for video frame interpolation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2022.
- J. Park, K. Ko, C. Lee, and C.-S. Kim, “Bmbc: Bilateral motion estimation with bilateral cost volume for video interpolation,” in European Conference on Computer Vision, 2020.
- J. Park, C. Lee, and C.-S. Kim, “Asymmetric bilateral motion estimation for video frame interpolation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- M. W. Marcellin, M. A. Lepley, A. Bilgin, T. J. Flohr, T. T. Chinen, and J. H. Kasner, “An overview of quantization in jpeg 2000,” Signal Processing: Image Communication, 2002.
- J. Choi and B. Han, “Task-aware quantization network for jpeg image compression,” in Computer Vision – ECCV 2020, 2020.
- A. Przelaskowski, “Statistical modeling and threshold selection of wavelet coefficients in lossy image coder,” in 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings, 2000.
- S. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for image denoising and compression,” IEEE Transactions on Image Processing, 2000.
- C. Montgomery, “Xiph.org Video Test Media (derf’s collection), the Xiph Open Source Community,” Online, https://media.xiph.org/video/derf, 1994.
- S. Niklaus, L. Mai, and F. Liu, “Video frame interpolation via adaptive convolution,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- ——, “Video frame interpolation via adaptive separable convolution,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
- H. Lee, T. Kim, T.-y. Chung, D. Pak, Y. Ban, and S. Lee, “Adacof: Adaptive collaboration of flows for video frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- X. Cheng and Z. Chen, “Video frame interpolation via deformable separable convolution,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
- S. Gui, C. Wang, Q. Chen, and D. Tao, “Featureflow: Robust video interpolation via structure-to-texture generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- M. Choi, H. Kim, B. Han, N. Xu, and K. M. Lee, “Channel attention is all you need for video frame interpolation,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
- L. Kong, B. Jiang, D. Luo, W. Chu, X. Huang, Y. Tai, C. Wang, and J. Yang, “Ifrnet: Intermediate feature refine network for efficient frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- J. Liu, L. Kong, and J. Yang, “Atca: an arc trajectory based model with curvature attention for video frame interpolation,” in 2022 IEEE International Conference on Image Processing (ICIP), 2022.
- L. Kong, J. Liu, and J. Yang, “Progressive motion context refine network for efficient video frame interpolation,” IEEE Signal Processing Letters, 2022.
- T. Ding, L. Liang, Z. Zhu, and I. Zharkov, “Cdfi: Compression-driven network design for frame interpolation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
- M. Choi, S. Lee, H. Kim, and K. M. Lee, “Motion-aware dynamic architecture for efficient frame interpolation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- M. Unser and T. Blu, “Mathematical properties of the jpeg2000 wavelet filters,” IEEE Transactions on Image Processing, 2003.
- D. Donoho, “De-noising by soft-thresholding,” IEEE Transactions on Information Theory, 1995.
- L. Liu, J. Liu, S. Yuan, G. Slabaugh, A. Leonardis, W. Zhou, and Q. Tian, “Wavelet-based dual-branch network for image demoiréing,” in Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII, 2020.
- H. Huang, R. He, Z. Sun, and T. Tan, “Wavelet-srnet: A wavelet-based cnn for multi-scale face super resolution,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
- H. Zhang, Z. Jin, X. Tan, and X. Li, “Towards lighter and faster: Learning wavelets progressively for image super-resolution,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020.
- X. Deng, R. Yang, M. Xu, and P. L. Dragotti, “Wavelet domain style transfer for an effective perception-distortion tradeoff in single image super-resolution,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
- M. Yang, F. Wu, and W. Li, “Waveletstereo: Learning wavelet coefficients of disparity map in stereo matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- M. Ramamonjisoa, M. Firman, J. Watson, V. Lepetit, and D. Turmukhambetov, “Single image depth prediction with wavelet decomposition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- C. J. Maddison, D. Tarlow, and T. Minka, “A∗∗{}^{\ast}start_FLOATSUPERSCRIPT ∗ end_FLOATSUPERSCRIPT sampling,” in Advances in Neural Information Processing Systems, 2014.
- E. Jang, S. S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” ArXiv, 2017.
- T. Bolukbasi, J. Wang, O. Dekel, and V. Saligrama, “Adaptive neural networks for efficient inference,” in Proceedings of the 34th International Conference on Machine Learning, 2017.
- G. Huang, D. Chen, T. Li, F. Wu, L. van der Maaten, and K. Q. Weinberger, “Multi-scale dense networks for resource efficient image classification,” in ICLR, 2018.
- X. Wang, F. Yu, Z.-Y. Dou, T. Darrell, and J. E. Gonzalez, “Skipnet: Learning dynamic routing in convolutional networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018.
- A. Veit and S. Belongie, “Convolutional networks with adaptive inference graphs,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018.
- Y. Bengio, N. Léonard, and A. C. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” ArXiv, vol. abs/1308.3432, 2013.
- E. Bengio, P.-L. Bacon, J. Pineau, and D. Precup, “Conditional computation in neural networks for faster models,” ArXiv, vol. abs/1511.06297, 2015.
- J. Lin, Y. Rao, J. Lu, and J. Zhou, “Runtime neural pruning,” in Advances in Neural Information Processing Systems, 2017.
- C. Li, G. Wang, B. Wang, X. Liang, Z. Li, and X. Chang, “Dynamic slimmable network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- S. Cao, L. Ma, W. Xiao, C. Zhang, Y. Liu, L. Zhang, L. Nie, and Z. Yang, “Seernet: Predicting convolutional neural network feature-map sparsity through low-bit quantization,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- S. Kong and C. Fowlkes, “Pixel-wise attentional gating for scene parsing,” in 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019.
- T. Verelst and T. Tuytelaars, “Dynamic convolutions: Exploiting spatial sparsity for faster inference,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- M. Ren, A. Pokrovsky, B. Yang, and R. Urtasun, “Sbnet: Sparse blocks network for fast inference,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
- D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
- L. Kong and J. Yang, “Fdflownet: Fast optical flow estimation using a deep lightweight network,” in 2020 IEEE International Conference on Image Processing (ICIP), 2020.
- L. Kong, X. Yang, and J. Yang, “Oas-net: Occlusion aware sampling network for accurate optical flow,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021.
- M. Kivanc Mihcak, I. Kozintsev, K. Ramchandran, and P. Moulin, “Low-complexity image denoising based on statistical modeling of wavelet coefficients,” IEEE Signal Processing Letters, 1999.
- A. Lewis and G. Knowles, “Image compression using the 2-d wavelet transform,” IEEE Transactions on Image Processing, 1992.
- X. Cheng and Z. Chen, “Multiple video frame interpolation via enhanced deformable separable convolution,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- P. Charbonnier, L. Blanc-Feraud, G. Aubert, and M. Barlaud, “Two deterministic half-quadratic regularization algorithms for computed imaging,” in Proceedings of 1st International Conference on Image Processing, 1994.
- S. Meister, J. Hur, and S. Roth, “UnFlow: Unsupervised learning of optical flow with a bidirectional census loss,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
- L. Kong and J. Yang, “Mdflow: Unsupervised optical flow learning by reliable mutual knowledge distillation,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in 7th International Conference on Learning Representations, 2019.
- Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, 2004.
- L. Kong, C. Shen, and J. Yang, “Fastflownet: A lightweight network for fast optical flow estimation,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021.