Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A New Creative Generation Pipeline for Click-Through Rate with Stable Diffusion Model (2401.10934v1)

Published 17 Jan 2024 in cs.IR and cs.AI

Abstract: In online advertising scenario, sellers often create multiple creatives to provide comprehensive demonstrations, making it essential to present the most appealing design to maximize the Click-Through Rate (CTR). However, sellers generally struggle to consider users preferences for creative design, leading to the relatively lower aesthetics and quantities compared to AI-based approaches. Traditional AI-based approaches still face the same problem of not considering user information while having limited aesthetic knowledge from designers. In fact that fusing the user information, the generated creatives can be more attractive because different users may have different preferences. To optimize the results, the generated creatives in traditional methods are then ranked by another module named creative ranking model. The ranking model can predict the CTR score for each creative considering user features. However, the two above stages are regarded as two different tasks and are optimized separately. In this paper, we proposed a new automated Creative Generation pipeline for Click-Through Rate (CG4CTR) with the goal of improving CTR during the creative generation stage. Our contributions have 4 parts: 1) The inpainting mode in stable diffusion is firstly applied to creative generation task in online advertising scene. A self-cyclic generation pipeline is proposed to ensure the convergence of training. 2) Prompt model is designed to generate individualized creatives for different user groups, which can further improve the diversity and quality. 3) Reward model comprehensively considers the multimodal features of image and text to improve the effectiveness of creative ranking task, and it is also critical in self-cyclic pipeline. 4) The significant benefits obtained in online and offline experiments verify the significance of our proposed method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards. (1989).
  2. Geoffrey E Hinton and Ruslan R Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. science 313, 5786 (2006), 504–507.
  3. Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
  4. Generative adversarial nets. Advances in neural information processing systems 27 (2014).
  5. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241.
  6. Generative adversarial text to image synthesis. In International conference on machine learning. PMLR, 1060–1069.
  7. Attention is all you need. Advances in neural information processing systems 30 (2017).
  8. DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).
  9. Neural discrete representation learning. Advances in neural information processing systems 30 (2017).
  10. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).
  11. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  12. Image matters: Visually modeling user behaviors using advanced model server. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2087–2095.
  13. An introduction to deep reinforcement learning. Foundations and Trends® in Machine Learning 11, 3-4 (2018), 219–354.
  14. Generating diverse high-fidelity images with vq-vae-2. Advances in neural information processing systems 32 (2019).
  15. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.
  16. What you look matters? offline evaluation of advertising creatives for cold-start problem. In Proceedings of the 28th ACM international conference on information and knowledge management. 2605–2613.
  17. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119.
  18. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  19. Enabling hyper-personalisation: Automated ad creative generation and ranking for fashion e-commerce. In Fashion Recommender Systems. Springer, 25–48.
  20. Learning to create better ads: Generation and ranking approaches for ad creative refinement. In Proceedings of the 29th ACM international conference on information & knowledge management. 2653–2660.
  21. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems 34 (2021), 852–863.
  22. Cogview: Mastering text-to-image generation via transformers. Advances in Neural Information Processing Systems 34 (2021), 19822–19835.
  23. Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34 (2021), 8780–8794.
  24. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).
  25. A hybrid bandit model with visual priors for creative ranking in display advertising. In Proceedings of the web conference 2021. 2324–2334.
  26. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  27. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  28. Projected gans converge faster. Advances in Neural Information Processing Systems 34 (2021), 17480–17492.
  29. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012–10022.
  30. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12873–12883.
  31. Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821–8831.
  32. Caponimage: Context-driven dense-captioning on image. arXiv preprint arXiv:2204.12974 (2022).
  33. Composition-aware graphic layout GAN for visual-textual presentation designs. arXiv preprint arXiv:2205.00303 (2022).
  34. Contrastive Learning for Topic-Dependent Image Ranking. In Workshop on Recommender Systems in Fashion and Retail. Springer, 79–98.
  35. CreaGAN: An Automatic Creative Generation Framework for Display Advertising. In Proceedings of the 30th ACM International Conference on Multimedia. 7261–7269.
  36. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 1, 2 (2022), 3.
  37. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  38. Highly accurate dichotomous image segmentation. In European Conference on Computer Vision. Springer, 38–56.
  39. Joint Optimization of Ad Ranking and Creative Selection. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2341–2346.
  40. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems 35 (2022), 36479–36494.
  41. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2149–2159.
  42. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789 2, 3 (2022), 5.
  43. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  44. Gligen: Open-set grounded text-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22511–22521.
  45. Improving image generation with better captions. Computer Science. https://cdn. openai. com/papers/dall-e-3. pdf (2023).
  46. LayoutDM: Discrete Diffusion Model for Controllable Layout Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10167–10176.
  47. LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation. arXiv preprint arXiv:2308.05095 (2023).
  48. Simo Ryu. 2023. Low-rank adaptation for fast text-to-image diffusion fine-tuning.
  49. Practice on Effectively Extracting NLP Features for Click-Through Rate Prediction. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4887–4893.
  50. Training diffusion models with reinforcement learning. arXiv preprint arXiv:2305.13301 (2023).
  51. What Image do You Need? A Two-stage Framework for Image Selection in E-commerce. In Companion Proceedings of the ACM Web Conference 2023. 452–456.
  52. Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles. arXiv preprint arXiv:2303.03751 (2023).
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com