Deep Learning-based Image and Video Inpainting: A Survey (2401.03395v1)
Abstract: Image and video inpainting is a classic problem in computer vision and computer graphics, aiming to fill in the plausible and realistic content in the missing areas of images and videos. With the advance of deep learning, this problem has achieved significant progress recently. The goal of this paper is to comprehensively review the deep learning-based methods for image and video inpainting. Specifically, we sort existing methods into different categories from the perspective of their high-level inpainting pipeline, present different deep learning architectures, including CNN, VAE, GAN, diffusion models, etc., and summarize techniques for module design. We review the training objectives and the common benchmark datasets. We present evaluation metrics for low-level pixel and high-level perceptional similarity, conduct a performance evaluation, and discuss the strengths and weaknesses of representative inpainting methods. We also discuss related real-world applications. Finally, we discuss open challenges and suggest potential future research directions.
- Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell (6):679–698
- Carlsson S (1988) Sketch based coding of grey level images. Sign Process 15(1):57–83
- Chen P (2018) Video retouch: Object removal. http://www.12371.cn/2021/02/08/ARTI1612745858192472.shtml
- Daubechies I (1990) The wavelet transform, time-frequency localization and signal analysis. IEEE Trans Inf Theory 36(5):961–1005
- Dosselmann R, Yang XD (2011) A comprehensive assessment of the structural similarity index. Sign Image and Video Process 5:81–91
- Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis (59):167–181
- Guillemot C, Meur OL (2014) Image inpainting: Overview and recent advances. IEEE Sign Process Magazine 31(1):127–144
- Han C, Wang J (2021) Face image inpainting with evolutionary generators. IEEE Sign Process Letters 28:190–193
- Herling J, Broll W (2014) High-quality real-time video inpainting with pixmix. IEEE Trans Vis Comput Graph 20(6):866–879
- Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
- Houle ME (2017a) Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications. In: Int. Conf. Similarity Search App., pp 64–79
- Houle ME (2017b) Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support. In: Int. Conf. Similarity Search App.
- Ilan S, Shamir A (2015) A survey on data-driven video completion. Comput Graph Forum 34(6):60–85
- Kingma DP, Welling M (2014) Auto-Encoding Variational Bayes. In: Int. Conf. Learn. Represent.
- Lim JH, Ye JC (2017) Geometric gan. arXiv preprint arXiv:170502894
- Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693
- Navasardyan S, Ohanyan M (2020) Image Inpainting with Onion Convolutions. In: Asian Conf. Comput. Vis.
- Phutke SS, Murala S (2021) Diverse receptive field based adversarial concurrent encoder network for image inpainting. IEEE Sign Process Letters 28:1873–1877
- Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556
- Tabak EG, Vanden-Eijnden E (2010) Density estimation by dual ascent of the log-likelihood. Commun Math Sci 8(1):217 – 233
- Tschumperlé D, Deriche R (2005) Vector-valued image regularization with pdes: a common framework for different applications. IEEE Trans Pattern Anal Mach Intell 27(4):506–517
- Yu F, Koltun V (2016) Multi-Scale Context Aggregation by Dilated Convolutions. In: Int. Conf. Learn. Represent.
- Zhang L, Agrawala M (2023) Adding conditional control to text-to-image diffusion models. 2302.05543