Video Dynamics Prior: An Internal Learning Approach for Robust Video Enhancements (2312.07835v1)
Abstract: In this paper, we present a novel robust framework for low-level vision tasks, including denoising, object removal, frame interpolation, and super-resolution, that does not require any external training data corpus. Our proposed approach directly learns the weights of neural modules by optimizing over the corrupted test sequence, leveraging the spatio-temporal coherence and internal statistics of videos. Furthermore, we introduce a novel spatial pyramid loss that leverages the property of spatio-temporal patch recurrence in a video across the different scales of the video. This loss enhances robustness to unstructured noise in both the spatial and temporal domains. This further results in our framework being highly robust to degradation in input frames and yields state-of-the-art results on downstream tasks such as denoising, object removal, and frame interpolation. To validate the effectiveness of our approach, we conduct qualitative and quantitative evaluations on standard video datasets such as DAVIS, UCF-101, and VIMEO90K-T.
- P. Arias and J.-M. Morel. Video denoising via empirical bayesian estimation of space-time patches. Journal of Mathematical Imaging and Vision, 60(1):70–93, 2018.
- Depth-aware video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- Blind super-resolution kernel estimation using an internal-gan. arXiv preprint arXiv:1909.06581, 2019.
- Hierarchical video prediction using relational layouts for human-object interactions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12146–12155, 2021.
- Basicvsr++: Improving video super-resolution with enhanced propagation and alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5972–5981, 2022.
- Temporally coherent gans for video super-resolution (tecogan). arXiv preprint arXiv:1811.09393, 1(2):3, 2018.
- E. Denton and R. Fergus. Stochastic video generation with a learned prior. In International Conference on Machine Learning, pages 1174–1183. PMLR, 2018.
- Self-supervised training for blind multi-frame video denoising. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2724–2734, 2021.
- Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 2758–2766, 2015.
- " double-dip": Unsupervised image decomposition via coupled deep-image-priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11026–11035, 2019.
- Flow-edge guided video completion, 2020.
- Recurrent back-projection network for video super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3897–3906, 2019.
- Space-time-aware multi-resolution video enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Real-time intermediate flow estimation for video frame interpolation. In Proceedings of the European Conference on Computer Vision (ECCV), 2022.
- Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2462–2470, 2017.
- M. Irani and S. Peleg. Image sequence enhancement using multiple motions analysis. Hebrew University of Jerusalem. Leibniz Center for Research in Computer …, 1991.
- Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9000–9008, 2018.
- Deep video super-resolution network using dynamic upsampling filters without explicit motion compensation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3224–3232, 2018.
- Deep video inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5792–5801, 2019.
- Spatio-temporal transformer network for video restoration. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
- High-quality self-supervised deep image denoising. Advances in Neural Information Processing Systems, 32:6970–6980, 2019.
- Blind video temporal consistency via deep video prior. In Advances in Neural Information Processing Systems, 2020.
- S. Niklaus and F. Liu. Softmax splatting for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5437–5446, 2020.
- Internal video inpainting by implicit long-range propagation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14579–14588, 2021.
- A benchmark dataset and evaluation methodology for video object segmentation. In Computer Vision and Pattern Recognition, 2016.
- A. Ranjan and M. J. Black. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4161–4170, 2017.
- A. Ranjan and M. J. Black. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
- Neural blind deconvolution using deep priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3341–3350, 2020.
- Singan: Learning a generative model from a single natural image. In Computer Vision (ICCV), IEEE International Conference on, 2019.
- Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
- Frame-recurrent video super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6626–6634, 2018.
- Extraction of high-resolution frames from video sequences. IEEE transactions on image processing, 5(6):996–1011, 1996.
- Space-time super-resolution from a single video. IEEE, 2011.
- Unsupervised deep video denoising. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1759–1768, 2021.
- Ingan: Capturing and retargeting the "dna" of a natural image. In The IEEE International Conference on Computer Vision (ICCV), 2019.
- “zero-shot” super-resolution using deep internal learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3118–3126, 2018.
- G. Shrivastava. Diverse Video Generation. PhD thesis, University of Maryland, College Park, 2021.
- G. Shrivastava and A. Shrivastava. Diverse video generation using a gaussian process trigger. arXiv preprint arXiv:2107.04619, 2021.
- Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
- Normalized entropy measure for multimodality image alignment. In Medical imaging 1998: image processing, volume 3338, pages 132–143. SPIE, 1998.
- Dvdnet: A fast network for deep video denoising. In 2019 IEEE International Conference on Image Processing (ICIP), pages 1805–1809. IEEE, 2019.
- Fastdvdnet: Towards real-time deep video denoising without flow estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Z. Teed and J. Deng. Raft: Recurrent all-pairs field transforms for optical flow, 2020.
- M. Thomas and A. T. Joy. Elements of information theory. Wiley-Interscience, 2006.
- Deep image prior. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9446–9454, 2018.
- Towards accurate generative models of video: A new metric & challenges. arXiv preprint arXiv:1812.01717, 2018.
- Deep single image manipulation. arXiv preprint arXiv:2007.01289, 2020.
- Edvr: Video restoration with enhanced deformable convolutional networks, 2019.
- Space-time completion of video. IEEE Transactions on pattern analysis and machine intelligence, 29(3):463–476, 2007.
- Affine multipicture motion-compensated prediction. IEEE Transactions on Circuits and Systems for Video Technology, 15(2):197–209, 2005.
- Deep flow-guided video inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- Deep flow-guided video inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3723–3732, 2019.
- Video enhancement with task-oriented flow. International Journal of Computer Vision (IJCV), 127(8):1106–1125, 2019.
- Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5505–5514, 2018.
- An internal learning approach to video inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2720–2729, 2019.
- Across scales and across dimensions: Temporal super-resolution using deep internal learning. In European Conference on Computer Vision, pages 52–68. Springer, 2020.