Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis (2403.13378v2)

Published 20 Mar 2024 in cs.CV

Abstract: Semantic image synthesis aims to generate high-quality images given semantic conditions, i.e. segmentation masks and style reference images. Existing methods widely adopt generative adversarial networks (GANs). GANs take all conditional inputs and directly synthesize images in a single forward step. In this paper, semantic image synthesis is treated as an image denoising task and is handled with a novel image-to-image diffusion model (IIDM). Specifically, the style reference is first contaminated with random noise and then progressively denoised by IIDM, guided by segmentation masks. Moreover, three techniques, refinement, color-transfer and model ensembles, are proposed to further boost the generation quality. They are plug-in inference modules and do not require additional training. Extensive experiments show that our IIDM outperforms existing state-of-the-art methods by clear margins. Further analysis is provided via detailed demonstrations. We have implemented IIDM based on the Jittor framework; code is available at https://github.com/ader47/jittor-jieke-semantic_images_synthesis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  2. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  3. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, volume 30, pages 6626–6637, 2017.
  4. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  5. Jittor: a novel deep learning framework with meta-operators and unified graph execution. Science China Information Sciences, 63:1–21, 2020.
  6. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2337–2346, 2019.
  7. Color transfer between images. IEEE Computer Graphics and Applications, 21(5):34–41, 2001.
  8. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  9. Nima: Neural image assessment. IEEE Transactions on Image Processing, 27(8):3998–4011, 2018.
  10. Efficient semantic image synthesis via class-adaptive normalization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4852–4866, 2021.
  11. Image synthesis via semantic composition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13749–13758, 2021.
  12. Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077–12090, 2021.
  13. Jittor-gan: A fast-training generative adversarial network model zoo based on jittor. Computational Visual Media, 7:153–157, 2021.
  14. Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5104–5113, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Feng Liu (1212 papers)
  2. Xiaobin Chang (14 papers)