Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semantic Image Synthesis with Unconditional Generator (2402.14395v1)

Published 22 Feb 2024 in cs.CV

Abstract: Semantic image synthesis (SIS) aims to generate realistic images that match given semantic masks. Despite recent advances allowing high-quality results and precise spatial control, they require a massive semantic segmentation dataset for training the models. Instead, we propose to employ a pre-trained unconditional generator and rearrange its feature maps according to proxy masks. The proxy masks are prepared from the feature maps of random samples in the generator by simple clustering. The feature rearranger learns to rearrange original feature maps to match the shape of the proxy masks that are either from the original sample itself or from random samples. Then we introduce a semantic mapper that produces the proxy masks from various input conditions including semantic masks. Our method is versatile across various applications such as free-form spatial editing of real images, sketch-to-photo, and even scribble-to-photo. Experiments validate advantages of our method on a range of datasets: human faces, animal faces, and buildings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Disentangled image generation through structured noise injection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5134–5142, 2020.
  2. Label-efficient semantic segmentation with diffusion models. arXiv preprint arXiv:2112.03126, 2021.
  3. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8188–8197, 2020.
  4. Editing in style: Uncovering the local semantics of gans. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5770–5779. IEEE Computer Society, 2020.
  5. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  6. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  7. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
  8. Perceiver: General perception with iterative attention. In International conference on machine learning, pages 4651–4664. PMLR, 2021.
  9. Masked and adaptive transformer for exemplar based image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22418–22427, 2023.
  10. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  11. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
  12. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  13. Diagonal attention and style-based gan for content-style disentanglement in image generation and translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13980–13989, 2021.
  14. Maskgan: Towards diverse and interactive facial image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5549–5558, 2020.
  15. Semantic segmentation with generative models: Semi-supervised learning and strong out-of-domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8300–8311, 2021.
  16. Photo-sketching: Inferring contour drawings from images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1403–1412. IEEE, 2019.
  17. Dynast: Dynamic sparse transformer for exemplar-guided image generation. In European Conference on Computer Vision, pages 72–90. Springer, 2022.
  18. Learning to predict layout-to-image conditional convolutions for semantic image synthesis. Advances in Neural Information Processing Systems, 32, 2019.
  19. Which training methods for gans do actually converge? In International conference on machine learning, pages 3481–3490. PMLR, 2018.
  20. Coordgan: Self-supervised dense correspondences emerge from gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10011–10020, 2022.
  21. Panda: Unsupervised learning of parts and appearances in the feature maps of gans. arXiv preprint arXiv:2206.00048, 2022.
  22. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2337–2346, 2019.
  23. Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2085–2094, 2021.
  24. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence, 44(3):1623–1637, 2020.
  25. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2287–2296, 2021.
  26. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022.
  27. You only need adversarial supervision for semantic image synthesis. In International Conference on Learning Representations.
  28. Diverse semantic image synthesis via probability distribution modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7962–7971, 2021.
  29. Dual attention gans for semantic image synthesis. In Proceedings of the 28th ACM International Conference on Multimedia, pages 1994–2002, 2020.
  30. Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG), 40(4):1–14, 2021.
  31. Repurposing gans for one-shot semantic part segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4475–4485, 2021.
  32. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  33. Sketch-guided text-to-image diffusion models. arXiv preprint arXiv:2211.13752, 2022.
  34. Pretraining is all you need for image-to-image translation, 2022a.
  35. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8798–8807, 2018.
  36. Semantic image synthesis via diffusion models. arXiv preprint arXiv:2207.00050, 2022b.
  37. Image synthesis via semantic composition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13749–13758, 2021.
  38. Hairclip: Design your hair by text and reference image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18072–18081, 2022.
  39. Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision, pages 1395–1403, 2015.
  40. Linear semantics in generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9351–9360, 2021.
  41. Transeditor: transformer-based dual-space gan for highly controllable facial editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7683–7692, 2022.
  42. Diversity-sensitive conditional generative adversarial networks. In International Conference on Learning Representations.
  43. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
  44. Adding conditional control to text-to-image diffusion models. 2023.
  45. Cross-domain correspondence learning for exemplar-based image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5143–5153, 2020a.
  46. Sg-one: Similarity guidance network for one-shot semantic segmentation. IEEE transactions on cybernetics, 50(9):3855–3865, 2020b.
  47. Datasetgan: Efficient labeled data factory with minimal human effort. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10145–10155, 2021.
  48. Cocosnet v2: Full-resolution correspondence learning for image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11465–11475, 2021.
  49. Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5104–5113, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hyunin Cho (1 paper)
  2. Sooyeon Go (2 papers)
  3. Kyungmook Choi (2 papers)
  4. Youngjung Uh (32 papers)
  5. JungWoo Chae (3 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.