Generate E-commerce Product Background by Integrating Category Commonality and Personalized Style (2312.13309v1)
Abstract: The state-of-the-art methods for e-commerce product background generation suffer from the inefficiency of designing product-wise prompts when scaling up the production, as well as the ineffectiveness of describing fine-grained styles when customizing personalized backgrounds for some specific brands. To address these obstacles, we integrate the category commonality and personalized style into diffusion models. Concretely, we propose a Category-Wise Generator to enable large-scale background generation for the first time. A unique identifier in the prompt is assigned to each category, whose attention is located on the background by a mask-guided cross attention layer to learn the category-wise style. Furthermore, for products with specific and fine-grained requirements in layout, elements, etc, a Personality-Wise Generator is devised to learn such personalized style directly from a reference image to resolve textual ambiguities, and is trained in a self-supervised manner for more efficient training data usage. To advance research in this field, the first large-scale e-commerce product background generation dataset BG60k is constructed, which covers more than 60k product images from over 2k categories. Experiments demonstrate that our method could generate high-quality backgrounds for different categories, and maintain the personalized background style of reference images. The link to BG60k and codes will be available soon.
- Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18208–18218, 2022.
- Blended latent diffusion. ACM Transactions on Graphics (TOG), 42(4):1–11, 2023.
- A large scale prediction engine for app install clicks and conversions. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 167–175, 2017.
- Efficient optimal selection for composited advertising creatives with tree structure. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3967–3975, 2021a.
- Automated creative optimization for e-commerce advertising. In Proceedings of the Web Conference 2021, pages 2304–2313, 2021b.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Globally and locally consistent image completion. ACM Transactions on Graphics (TOG), 36(4):1–14, 2017.
- Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1125–1134, 2017.
- Staging e-commerce products for online advertising using retrieval assisted image generation. arXiv preprint arXiv:2307.15326, 2023.
- Dreamedit: Subject-driven image editing. arXiv preprint arXiv:2306.12624, 2023.
- Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
- Learning to create better ads: Generation and ranking approaches for ad creative refinement. In Proceedings of the 29th ACM international conference on information & knowledge management, pages 2653–2660, 2020.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
- Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in neural information processing systems, 35:25278–25294, 2022.
- Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2149–2159, 2022a.
- Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2149–2159, 2022b.
- Realfill: Reference-driven generation for authentic image completion. arXiv preprint arXiv:2309.16668, 2023.
- Imagen editor and editbench: Advancing and evaluating text-guided image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18359–18369, 2023.
- Towards personalized bundle creative generation with contrastive non-autoregressive decoding. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2634–2638, 2022.
- Reference-based painterly inpainting via diffusion: Crossing the wild reference domain gap. arXiv preprint arXiv:2307.10584, 2023.
- Paint by example: Exemplar-based image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18381–18391, 2023.
- Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
- Generative image inpainting with contextual attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5505–5514, 2018.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
- Understanding consumer journey using attention based recurrent neural networks. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 3102–3111, 2019.