Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis (2312.13314v2)

Published 20 Dec 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Semantic image synthesis, i.e., generating images from user-provided semantic label maps, is an important conditional image generation task as it allows to control both the content as well as the spatial layout of generated images. Although diffusion models have pushed the state of the art in generative image modeling, the iterative nature of their inference process makes them computationally demanding. Other approaches such as GANs are more efficient as they only need a single feed-forward pass for generation, but the image quality tends to suffer on large and diverse datasets. In this work, we propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images by exploiting feature backbone networks pre-trained for tasks such as image classification. We also introduce a new generator architecture with better context modeling and using cross-attention to inject noise into latent variables, leading to more diverse generated images. Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes, surpassing recent diffusion models while requiring two orders of magnitude less compute for inference.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Diffusion-based data augmentation for skin disease classification: Impact across original medical datasets to fully synthetic images. arXiv 2301.04802, 2023.
  2. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022.
  3. Spatext: Spatio-textual representation for controllable image generation. In CVPR, 2023.
  4. Synthetic data from diffusion models improves imagenet classification. TMLR, 2023.
  5. Large scale GAN training for high fidelity natural image synthesis. In ICLR, 2019.
  6. Few-shot semantic image synthesis with class affinity transfer. In CVPR, 2023.
  7. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. PAMI, 40(4):834–848, 2018.
  8. Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In CVPR, 2020.
  9. Masked-attention mask transformer for universal image segmentation. In CVPR, 2022.
  10. Zero-shot spatial layout conditioning for text-to-image diffusion models. In ICCV, 2023.
  11. Diffusion models beat GANs on image synthesis. In NeurIPS, 2021.
  12. Density estimation using Real NVP. In ICLR, 2017.
  13. Styleflow for content-fixed image to image translation. arXiv, 2207.01909, 2022.
  14. Make-a-scene: Scene-based text-to-image generation with human priors. In ECCV, 2022.
  15. Generative adversarial nets. In NeurIPS, 2014.
  16. Feedback-guided data synthesis for imbalanced classification. arXiv, 2310.00158, 2023.
  17. Bridging nonlinearities and stochastic regularizers with Gaussian error linear units. arXiv, 1606.08415, 2016.
  18. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In NeurIPS, 2017.
  19. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  20. Multimodal conditional image synthesis with product-of-experts GANs. In ECCV, 2022.
  21. Generative adversarial transformers. In ICML, 2021.
  22. Image-to-image translation with conditional adversarial networks. CVPR, 2017.
  23. Scaling up GANs for text-to-image synthesis. In CVPR, 2023.
  24. A style-based generator architecture for generative adversarial networks. In CVPR, 2019.
  25. Analyzing and improving the image quality of StyleGAN. In CVPR, 2020.
  26. Alias-free generative adversarial networks. In NeurIPS, 2021.
  27. Glow: Generative flow with invertible 1×\times×1 convolutions. In NeurIPS, 2018.
  28. Auto-encoding variational Bayes. In ICLR, 2014.
  29. Segment anything. arXiv preprint, 2023.
  30. Ensembling off-the-shelf models for GAN training. In CVPR, 2022.
  31. ViTGAN: Training GANs with vision transformers. In ICLR, 2022.
  32. Dual pyramid generative adversarial networks for semantic image synthesis. In BMVC, 2022.
  33. Focal loss for dense object detection. In ICCV, 2017.
  34. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
  35. Adaptive density estimation for generative models. In NeurIPS, 2019.
  36. On self-supervised image representations for gan evaluation. In ICLR, 2021.
  37. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In ICML, 2022.
  38. f-GAN: Training generative neural samplers using variational divergence minimization. In NeurIPS, 2016.
  39. Representation learning with contrastive predictive coding. arXiv, 1807.03748, 2019.
  40. Semantic image synthesis with spatially-adaptive normalization. In CVPR, 2019.
  41. Contrastive learning for unpaired image-to-image translation. In ECCV, 2020.
  42. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint, 2204.06125, 2022.
  43. Generating diverse high-fidelity images with VQ-VAE-2. In NeurIPS, 2019.
  44. Enhancing photorealism enhancement. IEEE TPAMI, 45(2):1700–1715, 2022.
  45. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  46. U-Net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
  47. Improved techniques for training GANs. In NeurIPS, 2016.
  48. Projected GANs converge faster. In NeurIPS, 2021.
  49. You only need adversarial supervision for semantic image synthesis. In ICLR, 2021.
  50. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
  51. Rethinking the inception architecture for computer vision. In CVPR, 2016.
  52. EfficientNet: Rethinking model scaling for convolutional neural networks. In ICML, 2019.
  53. NVAE: A deep hierarchical variational autoencoder. In NeurIPS, 2020.
  54. Pretraining is all you need for image-to-image translation. arXiv, 2205.12952, 2022a.
  55. High-resolution image synthesis and semantic manipulation with conditional GANs. In CVPR, 2018.
  56. Semantic image synthesis via diffusion models. arXiv preprint, 2207.00050, 2022b.
  57. Unified perceptual parsing for scene understanding. In ECCV, 2018.
  58. Dilated residual networks. In CVPR, 2017.
  59. Self-attention generative adversarial networks. In ICML, 2019.
  60. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  61. X-Paste: Revisiting scalable copy-paste for instance segmentation using CLIP and StableDiffusion. In ICML, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tariq Berrada (3 papers)
  2. Jakob Verbeek (59 papers)
  3. Camille Couprie (24 papers)
  4. Karteek Alahari (48 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com