TCIG: Two-Stage Controlled Image Generation with Quality Enhancement through Diffusion (2403.01212v1)
Abstract: In recent years, significant progress has been made in the development of text-to-image generation models. However, these models still face limitations when it comes to achieving full controllability during the generation process. Often, specific training or the use of limited models is required, and even then, they have certain restrictions. To address these challenges, A two-stage method that effectively combines controllability and high quality in the generation of images is proposed. This approach leverages the expertise of pre-trained models to achieve precise control over the generated images, while also harnessing the power of diffusion models to achieve state-of-the-art quality. By separating controllability from high quality, This method achieves outstanding results. It is compatible with both latent and image space diffusion models, ensuring versatility and flexibility. Moreover, This approach consistently produces comparable outcomes to the current state-of-the-art methods in the field. Overall, This proposed method represents a significant advancement in text-to-image generation, enabling improved controllability without compromising on the quality of the generated images.
- Blended latent diffusion. 2022. doi: 10.1145/3592450.
- SpaText: Spatio-textual representation for controllable image generation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jun 2023. doi: 10.1109/cvpr52729.2023.01762. URL https://doi.org/10.1109%2Fcvpr52729.2023.01762.
- Multidiffusion: Fusing diffusion paths for controlled image generation, 2023.
- Rethinking atrous convolution for semantic image segmentation, 2017.
- Ilvr: Conditioning method for denoising diffusion probabilistic models, 2021.
- Diffedit: Diffusion-based semantic image editing with mask guidance, 2022.
- Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10850–10869, sep 2023. doi: 10.1109/tpami.2023.3261988. URL https://doi.org/10.1109%2Ftpami.2023.3261988.
- Vqgan-clip: Open domain image generation and editing with natural language guidance, 2022.
- Diffusion models beat gans on image synthesis, 2021.
- The pascal visual object classes (voc) challenge. Int. J. Comput. Vis., 88(2):303–338, 2010. URL http://dblp.uni-trier.de/db/journals/ijcv/ijcv88.html#EveringhamGWWZ10.
- Make-a-scene: Scene-based text-to-image generation with human priors, 2022.
- Prompt-to-prompt image editing with cross attention control, 2022.
- Denoising diffusion probabilistic models, 2020.
- Imagic: Text-based real image editing with diffusion models, 2023.
- Diffusionclip: Text-guided diffusion models for robust image manipulation, 2022.
- Leveraging off-the-shelf diffusion model for multi-attribute fashion image manipulation, 2022.
- Diffusion-based image translation using disentangled style and content representation, 2023.
- Microsoft coco: Common objects in context, 2015.
- Sdedit: Guided image synthesis and editing with stochastic differential equations, 2022.
- Null-text inversion for editing real images using guided diffusion models, 2022.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models, 2022.
- Learning transferable visual models from natural language supervision, 2021.
- Hierarchical text-conditional image generation with clip latents, 2022.
- High-resolution image synthesis with latent diffusion models, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding, 2022.
- Plug-and-play diffusion features for text-driven image-to-image translation, 2022.
- Pretraining is all you need for image-to-image translation, 2022.
- Scaling autoregressive models for content-rich text-to-image generation, 2022.
- Salaheldin Mohamed (3 papers)