Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Patch-enhanced Mask Encoder Prompt Image Generation (2405.19085v1)

Published 29 May 2024 in cs.AI and cs.CV

Abstract: Artificial Intelligence Generated Content(AIGC), known for its superior visual results, represents a promising mitigation method for high-cost advertising applications. Numerous approaches have been developed to manipulate generated content under different conditions. However, a crucial limitation lies in the accurate description of products in advertising applications. Applying previous methods directly may lead to considerable distortion and deformation of advertised products, primarily due to oversimplified content control conditions. Hence, in this work, we propose a patch-enhanced mask encoder approach to ensure accurate product descriptions while preserving diverse backgrounds. Our approach consists of three components Patch Flexible Visibility, Mask Encoder Prompt Adapter and an image Foundation Model. Patch Flexible Visibility is used for generating a more reasonable background image. Mask Encoder Prompt Adapter enables region-controlled fusion. We also conduct an analysis of the structure and operational mechanisms of the Generation Module. Experimental results show our method can achieve the highest visual results and FID scores compared with other methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Flexivit: One model for all patch sizes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14496–14506, 2023.
  2. Pixart: Fast training of diffusion transformer for photorealistic text-to-image synthesis. arXiv preprint arXiv:2310.00426, 2023.
  3. Patch n’pack: Navit, a vision transformer for any aspect ratio and resolution. Advances in Neural Information Processing Systems, 36, 2024.
  4. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  5. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  6. Caponimage: Context-driven dense-captioning on image. arXiv preprint arXiv:2204.12974, 2022.
  7. Model tells you what to discard: Adaptive kv cache compression for llms. arXiv preprint arXiv:2310.01801, 2023.
  8. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  9. Cascaded diffusion models for high fidelity image generation. 2021.
  10. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  11. Composer: Creative and controllable image synthesis with composable conditions. arXiv preprint arXiv:2302.09778, 2023.
  12. Layoutdm: Discrete diffusion model for controllable layout generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10167–10176, 2023.
  13. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023.
  14. Grammar variational autoencoder. In International Conference on Machine Learning, 2017.
  15. Pix2struct: Screenshot parsing as pretraining for visual language understanding. In International Conference on Machine Learning, pages 18893–18912. PMLR, 2023.
  16. Acn-data: Analysis and applications of an open ev charging dataset. In International Conference on Future Energy Systems, 2019.
  17. Image harmonization with diffusion model, 2023.
  18. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning, pages 12888–12900. PMLR, 2022.
  19. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  20. Genphys: From physical processes to generative models, 2023.
  21. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models, 2023.
  22. Fit: Flexible vision transformer for diffusion model, 2024.
  23. Latent consistency models: Synthesizing high-resolution images with few-step inference, 2023.
  24. Sit: Exploring flow and diffusion-based generative models with scalable interpolant transformers. arXiv preprint arXiv:2401.08740, 2024.
  25. Learning to create better ads: Generation and ranking approaches for ad creative refinement. In Proceedings of the 29th ACM international conference on information & knowledge management, pages 2653–2660, 2020.
  26. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  27. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023.
  28. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
  29. Layoutllm-t2i: Eliciting layout guidance from llm for text-to-image generation. In Proceedings of the 31st ACM International Conference on Multimedia, pages 643–654, 2023.
  30. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  31. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  32. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  33. Denoising diffusion implicit models, 2022.
  34. Generative modeling by estimating gradients of the data distribution. 2019.
  35. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  36. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  37. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  38. Enabling hyper-personalisation: Automated ad creative generation and ranking for fashion e-commerce. In Fashion Recommender Systems, pages 25–48. Springer, 2020.
  39. Pfgm++: Unlocking the potential of physics-inspired generative models, 2023.
  40. A new creative generation pipeline for click-through rate with stable diffusion model. In Companion Proceedings of the ACM on Web Conference 2024, pages 180–189, 2024.
  41. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721, 2023.
  42. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  43. gddim: Generalized denoising diffusion implicit models, 2023.
  44. Uni-controlnet: All-in-one control to text-to-image diffusion models. Advances in Neural Information Processing Systems, 36, 2024.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets