Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners (2403.05578v1)

Published 28 Feb 2024 in cs.HC, cs.AI, cs.CV, cs.IR, and cs.LG

Abstract: Text-to-image models such as stable diffusion have opened a plethora of opportunities for generating art. Recent literature has surveyed the use of text-to-image models for enhancing the work of many creative artists. Many e-commerce platforms employ a manual process to generate the banners, which is time-consuming and has limitations of scalability. In this work, we demonstrate the use of text-to-image models for generating personalized web banners with dynamic content for online shoppers based on their interactions. The novelty in this approach lies in converting users' interaction data to meaningful prompts without human intervention. To this end, we utilize a LLM to systematically extract a tuple of attributes from item meta-information. The attributes are then passed to a text-to-image model via prompt engineering to generate images for the banner. Our results show that the proposed approach can create high-quality personalized banners for users.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Promptify: Text-to-image generation through interactive prompt exploration with large language models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 1–14.
  2. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  3. A Survey on Generative Diffusion Models. IEEE Transactions on Knowledge and Data Engineering (2024).
  4. A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt. arXiv preprint arXiv:2303.04226 (2023).
  5. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research 24, 240 (2023), 1–113.
  6. Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34 (2021), 8780–8794.
  7. LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts. arXiv:2310.10640
  8. Gartner, Inc. 2023. Gartner Experts Answer the Top Generative AI Questions for Your Enterprise. https://www.gartner.com/en/topics/generative-ai. Accessed: Feb 06, 2024.
  9. Generative adversarial nets. Advances in neural information processing systems 27 (2014).
  10. A systematic survey of prompt engineering on vision-language foundation models. arXiv preprint arXiv:2307.12980 (2023).
  11. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  12. Introvae: Introspective variational autoencoders for photographic image synthesis. Advances in neural information processing systems 31 (2018).
  13. Ultralytics YOLO. https://github.com/ultralytics/ultralytics
  14. Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
  15. Holistic evaluation of language models. arXiv preprint arXiv:2211.09110 (2022).
  16. Xudong Mao and Qing Li. 2021. Generative adversarial networks for image generation. Springer.
  17. Llm-based aspect augmentations for recommendation systems. (2023).
  18. No-reference image quality assessment in the spatial domain. IEEE Transactions on image processing 21, 12 (2012), 4695–4708.
  19. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv:2307.01952 [cs.CV]
  20. Is ChatGPT a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476 (2023).
  21. Reddit, Inc. 2024. r/StableDiffusion. https://www.reddit.com/r/StableDiffusion/. Accessed: Feb 4, 2024.
  22. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  23. EvEval: A Comprehensive Evaluation of Event Semantics for Large Language Models. arXiv preprint arXiv:2305.15268 (2023).
  24. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  25. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288
  26. Learning to extract attribute value from product via question answering: A multi-task approach. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 47–55.
  27. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682 (2022).
  28. Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100 (2022).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Shanu Vashishtha (3 papers)
  2. Abhinav Prakash (19 papers)
  3. Lalitesh Morishetti (4 papers)
  4. Kaushiki Nag (11 papers)
  5. Yokila Arora (4 papers)
  6. Sushant Kumar (38 papers)
  7. Kannan Achan (44 papers)
Citations (1)