Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting (2403.18186v2)

Published 27 Mar 2024 in cs.CV

Abstract: We present a method for large-mask pluralistic image inpainting based on the generative framework of discrete latent codes. Our method learns latent priors, discretized as tokens, by only performing computations at the visible locations of the image. This is realized by a restrictive partial encoder that predicts the token label for each visible block, a bidirectional transformer that infers the missing labels by only looking at these tokens, and a dedicated synthesis network that couples the tokens with the partial image priors to generate coherent and pluralistic complete image even under extreme mask settings. Experiments on public benchmarks validate our design choices as the proposed method outperforms strong baselines in both visual quality and diversity metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Filling-in by joint interpolation of vector fields and gray levels. IEEE transactions on image processing, 10(8):1200–1211, 2001.
  2. Patchmatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph., 28(3):24, 2009.
  3. Image inpainting. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 417–424, 2000.
  4. Maskgit: Masked generative image transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11315–11325, 2022.
  5. Generative pretraining from pixels. In International conference on machine learning, pages 1691–1703. PMLR, 2020.
  6. Object removal by exemplar-based inpainting. In 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., pages II–II. IEEE, 2003.
  7. Image melding: Combining inconsistent images using patch-based synthesis. ACM Transactions on graphics (TOG), 31(4):1–10, 2012.
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  9. Fragment-based image completion. In ACM SIGGRAPH 2003 Papers, pages 303–312. 2003.
  10. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  11. Inpainting and zooming using sparse representations. The Computer Journal, 52(1):64–79, 2009.
  12. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  13. Progressive image inpainting with full-resolution residual network. In Proceedings of the 27th acm international conference on multimedia, pages 2496–2504, 2019.
  14. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  15. Globally and locally consistent image completion. ACM Transactions on Graphics (ToG), 36(4):1–14, 2017.
  16. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
  17. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, pages 694–711. Springer, 2016.
  18. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations, 2018.
  19. Continuously masked transformer for image inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13169–13178, 2023.
  20. Laplacian patch-based image synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2727–2735, 2016.
  21. Levin and Zomet. Learning how to inpaint from global image statistics. In Proceedings Ninth IEEE international conference on computer vision, pages 305–312. IEEE, 2003.
  22. Recurrent feature reasoning for image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7760–7768, 2020.
  23. Mat: Mask-aware transformer for large hole image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10758–10768, 2022.
  24. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European conference on computer vision (ECCV), pages 85–100, 2018.
  25. Coherent semantic attention for image inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4170–4179, 2019.
  26. Pd-gan: Probabilistic diverse gan for image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9371–9381, 2021.
  27. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  28. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2536–2544, 2016.
  29. Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems, 32, 2019.
  30. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In Proceedings of the AAAI conference on artificial intelligence, 2018.
  31. Transinpaint: Transformer-based image inpainting with context adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 849–858, 2023.
  32. Mathematical models for local nontexture inpaintings. SIAM Journal on Applied Mathematics, 62(3):1019–1043, 2002.
  33. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  34. Contextual-based image inpainting: Infer, match, and translate. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  35. Image completion with structure propagation. In ACM SIGGRAPH 2005 Papers, pages 861–868. 2005.
  36. Resolution-robust large mask inpainting with fourier convolutions. arXiv preprint arXiv:2109.07161, 2021.
  37. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  38. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  39. Foreground-aware image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5840–5848, 2019.
  40. Shift-net: Image inpainting via deep feature rearrangement. In Proceedings of the European conference on computer vision (ECCV), pages 1–17, 2018.
  41. Contextual residual aggregation for ultra high-resolution image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7508–7517, 2020.
  42. Diverse inpainting and editing with gan inversion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23120–23130, 2023.
  43. Generative image inpainting with contextual attention. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5505–5514, 2018.
  44. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4471–4480, 2019.
  45. High-resolution image inpainting with iterative confidence feedback and guided upsampling. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16, pages 1–17. Springer, 2020.
  46. Semantic image inpainting with progressive generative networks. In Proceedings of the 26th ACM international conference on Multimedia, pages 1939–1947, 2018a.
  47. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018b.
  48. Pluralistic image completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1438–1447, 2019.
  49. Bridging global context interactions for high-fidelity image completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11512–11522, 2022.
  50. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6):1452–1464, 2017.
  51. Toward multimodal image-to-image translation. Advances in neural information processing systems, 30, 2017.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com