Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generate E-commerce Product Background by Integrating Category Commonality and Personalized Style (2312.13309v1)

Published 20 Dec 2023 in cs.CV and cs.AI

Abstract: The state-of-the-art methods for e-commerce product background generation suffer from the inefficiency of designing product-wise prompts when scaling up the production, as well as the ineffectiveness of describing fine-grained styles when customizing personalized backgrounds for some specific brands. To address these obstacles, we integrate the category commonality and personalized style into diffusion models. Concretely, we propose a Category-Wise Generator to enable large-scale background generation for the first time. A unique identifier in the prompt is assigned to each category, whose attention is located on the background by a mask-guided cross attention layer to learn the category-wise style. Furthermore, for products with specific and fine-grained requirements in layout, elements, etc, a Personality-Wise Generator is devised to learn such personalized style directly from a reference image to resolve textual ambiguities, and is trained in a self-supervised manner for more efficient training data usage. To advance research in this field, the first large-scale e-commerce product background generation dataset BG60k is constructed, which covers more than 60k product images from over 2k categories. Experiments demonstrate that our method could generate high-quality backgrounds for different categories, and maintain the personalized background style of reference images. The link to BG60k and codes will be available soon.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18208–18218, 2022.
  2. Blended latent diffusion. ACM Transactions on Graphics (TOG), 42(4):1–11, 2023.
  3. A large scale prediction engine for app install clicks and conversions. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 167–175, 2017.
  4. Efficient optimal selection for composited advertising creatives with tree structure. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3967–3975, 2021a.
  5. Automated creative optimization for e-commerce advertising. In Proceedings of the Web Conference 2021, pages 2304–2313, 2021b.
  6. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  7. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  8. Globally and locally consistent image completion. ACM Transactions on Graphics (TOG), 36(4):1–14, 2017.
  9. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1125–1134, 2017.
  10. Staging e-commerce products for online advertising using retrieval assisted image generation. arXiv preprint arXiv:2307.15326, 2023.
  11. Dreamedit: Subject-driven image editing. arXiv preprint arXiv:2306.12624, 2023.
  12. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  13. Learning to create better ads: Generation and ranking approaches for ad creative refinement. In Proceedings of the 29th ACM international conference on information & knowledge management, pages 2653–2660, 2020.
  14. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  15. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  16. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  17. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  18. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  19. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in neural information processing systems, 35:25278–25294, 2022.
  20. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2149–2159, 2022a.
  21. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2149–2159, 2022b.
  22. Realfill: Reference-driven generation for authentic image completion. arXiv preprint arXiv:2309.16668, 2023.
  23. Imagen editor and editbench: Advancing and evaluating text-guided image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18359–18369, 2023.
  24. Towards personalized bundle creative generation with contrastive non-autoregressive decoding. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2634–2638, 2022.
  25. Reference-based painterly inpainting via diffusion: Crossing the wild reference domain gap. arXiv preprint arXiv:2307.10584, 2023.
  26. Paint by example: Exemplar-based image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18381–18391, 2023.
  27. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
  28. Generative image inpainting with contextual attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5505–5514, 2018.
  29. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  30. Understanding consumer journey using attention based recurrent neural networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 3102–3111, 2019.
Citations (2)

Summary

We haven't generated a summary for this paper yet.