Sketch-guided Image Inpainting with Partial Discrete Diffusion Process (2404.11949v1)
Abstract: In this work, we study the task of sketch-guided image inpainting. Unlike the well-explored natural language-guided image inpainting, which excels in capturing semantic details, the relatively less-studied sketch-guided inpainting offers greater user control in specifying the object's shape and pose to be inpainted. As one of the early solutions to this task, we introduce a novel partial discrete diffusion process (PDDP). The forward pass of the PDDP corrupts the masked regions of the image and the backward pass reconstructs these masked regions conditioned on hand-drawn sketches using our proposed sketch-guided bi-directional transformer. The proposed novel transformer module accepts two inputs -- the image containing the masked region to be inpainted and the query sketch to model the reverse diffusion process. This strategy effectively addresses the domain gap between sketches and natural images, thereby, enhancing the quality of inpainting results. In the absence of a large-scale dataset specific to this task, we synthesize a dataset from the MS-COCO to train and extensively evaluate our proposed framework against various competent approaches in the literature. The qualitative and quantitative results and user studies establish that the proposed method inpaints realistic objects that fit the context in terms of the visual appearance of the provided sketch. To aid further research, we have made our code publicly available at https://github.com/vl2g/Sketch-Inpainting .
- Adversarial inpainting of medical image modalities. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.
- ipa-medgan: Inpainting of arbitrary regions in medical imaging. In 2020 IEEE International Conference on Image Processing (ICIP), 2020.
- Structured denoising diffusion models in discrete state-spaces. ArXiv, abs/2107.03006, 2021.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- Image inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 2000.
- In ECCV, 2022.
- Maskgit: Masked generative image transformer. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Sketch2photo: Internet image montage. ACM Trans. Graph., 2009.
- Sketchygan: Towards diverse and realistic sketch to image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9416–9425, 2018.
- Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on Image Processing, 13(9):1200–1212, 2004.
- Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, 2021.
- Cogview: Mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems, 34:19822–19835, 2021.
- Taming transformers for high-resolution image synthesis. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Sketchycoco: Image generation from freehand scene sketches. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5174–5183, 2020.
- Scene-level sketch-based image retrieval with minimal pairwise supervision. In Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pages 650–657. AAAI Press, 2023.
- Vector quantized diffusion model for text-to-image synthesis. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Argmax flows and multinomial diffusion: Towards non-autoregressive language models. CoRR, abs/2102.05379, 2021.
- Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), 36(4):1–14, 2017.
- Sc-fegan: Face editing generative adversarial network with user’s sketch and color. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1745–1753, 2019a.
- Sc-fegan: Face editing generative adversarial network with user’s sketch and color. In The IEEE International Conference on Computer Vision (ICCV), 2019b.
- Picture that sketch: Photorealistic image generation from abstract sketches. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6850–6861, 2023.
- Photo-sketching: Inferring contour drawings from images. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019.
- Mat: Mask-aware transformer for large hole image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10758–10768, 2022.
- Fine-grained sketch-based image retrieval by matching deformable part models. 2014.
- Microsoft coco: Common objects in context. In European Conference on Computer Vision, 2014.
- Deflocnet: Deep image editing via flexible low level controls. In CVPR, 2021.
- Image generation from sketch constraint using contextual gan. In Proceedings of the European conference on computer vision (ECCV), pages 205–220, 2018.
- Glama: Joint spatial and frequency loss for general image inpainting. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022.
- Repaint: Inpainting using denoising diffusion probabilistic models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11451–11461, 2022.
- Edgeconnect: Generative image inpainting with adversarial edge learning. arxiv 2019. arXiv preprint arXiv:1901.00212, 2020.
- Nuwa-lip: language-guided image inpainting with defect-free vqgan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14183–14192, 2023.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- Shape-guided diffusion with inside-outside attention. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
- On aliased resizing and surprising subtleties in gan evaluation. In CVPR, 2022.
- Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2536–2544, 2016.
- Faceshop: Deep sketch-based face image editing. arXiv preprint arXiv:1804.08972, 2018.
- Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
- Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
- High-resolution image synthesis with latent diffusion models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015.
- Palette: Image-to-image diffusion models. ACM SIGGRAPH 2022 Conference Proceedings, 2021.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, 2015.
- Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2022.
- Multi-task learning for medical image inpainting based on organ boundary awareness. Applied Sciences, 11, 2021.
- Sketch-guided object localization in natural images. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, pages 532–547. Springer, 2020.
- Query-guided attention in vision transformers for localizing objects using a single sketch. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1083–1092, 2024.
- Sketch-guided text-to-image diffusion models. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023.
- Sketchknitter: Vectorized sketch generation with diffusion models. In The Eleventh International Conference on Learning Representations, 2023.
- Esrgan: Enhanced super-resolution generative adversarial networks. In ECCV Workshops, 2018.
- Adversarial open domain adaptation for sketch-to-photo synthesis. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 1434–1444, 2022.
- Smartbrush: Text and shape guided object inpainting with diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22428–22437, 2023.
- Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5505–5514, 2018a.
- Free-form image inpainting with gated convolution. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2018b.
- Shape-guided object inpainting. arXiv preprint arXiv:2204.07845, 2022a.
- Sketchedit: Mask-free local image manipulation with partial sketches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022b.
- Text-guided neural image inpainting. In Proceedings of the 28th ACM International Conference on Multimedia, 2020.
- Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Nakul Sharma (3 papers)
- Aditay Tripathi (7 papers)
- Anirban Chakraborty (52 papers)
- Anand Mishra (18 papers)