Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AIGB: Generative Auto-bidding via Conditional Diffusion Modeling (2405.16141v4)

Published 25 May 2024 in cs.LG, cs.AI, and cs.CE

Abstract: Auto-bidding plays a crucial role in facilitating online advertising by automatically providing bids for advertisers. Reinforcement learning (RL) has gained popularity for auto-bidding. However, most current RL auto-bidding methods are modeled through the Markovian Decision Process (MDP), which assumes the Markovian state transition. This assumption restricts the ability to perform in long horizon scenarios and makes the model unstable when dealing with highly random online advertising environments. To tackle this issue, this paper introduces AI-Generated Bidding (AIGB), a novel paradigm for auto-bidding through generative modeling. In this paradigm, we propose DiffBid, a conditional diffusion modeling approach for bid generation. DiffBid directly models the correlation between the return and the entire trajectory, effectively avoiding error propagation across time steps in long horizons. Additionally, DiffBid offers a versatile approach for generating trajectories that maximize given targets while adhering to specific constraints. Extensive experiments conducted on the real-world dataset and online A/B test on Alibaba advertising platform demonstrate the effectiveness of DiffBid, achieving 2.81% increase in GMV and 3.36% increase in ROI.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Learning to poke by poking: Experiential learning of intuitive physics. Advances in neural information processing systems 29 (2016).
  2. Is Conditional Generative Modeling all you need for Decision Making?. In The Eleventh International Conference on Learning Representations.
  3. Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems 34 (2021), 17981–17993.
  4. Robust auction design in the auto-bidding world. Advances in Neural Information Processing Systems 34 (2021), 17777–17788.
  5. The landscape of auto-bidding auctions: Value versus utility maximization. In Proceedings of the 22nd ACM Conference on Economics and Computation. 132–133.
  6. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the tenth ACM international conference on web search and data mining. 661–670.
  7. Denoising likelihood score matching for conditional score-based data generation. arXiv preprint arXiv:2203.14206 (2022).
  8. Offline reinforcement learning via high-fidelity generative behavior modeling. arXiv preprint arXiv:2209.14548 (2022).
  9. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems 34 (2021), 15084–15097.
  10. Acquisition of domain-related information in relation to high and low domain knowledge. Journal of verbal learning and verbal behavior 18, 3 (1979), 257–273.
  11. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
  12. Towards efficient auctions in an auto-bidding world. In Proceedings of the Web Conference 2021. 3965–3973.
  13. David S Evans. 2009. The online advertising industry: Economics, evolution, and privacy. Journal of economic perspectives 23, 3 (2009), 37–60.
  14. Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.
  15. Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 6918–6943. https://proceedings.mlr.press/v162/fujimoto22a.html
  16. Maor Gaon and Ronen Brafman. 2020. Reinforcement learning with non-markovian rewards. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 3980–3987.
  17. Learning Multi-granularity Consecutive User Intent Unit for Session-based Recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (Virtual Event, AZ, USA) (WSDM ’22). Association for Computing Machinery, New York, NY, USA, 343–352. https://doi.org/10.1145/3488560.3498524
  18. Louisa Ha. 2008. Online advertising research in advertising journals: A review. Journal of Current Issues & Research in Advertising 30, 1 (2008), 31–48.
  19. Idql: Implicit q-learning as an actor-critic method with diffusion policies. arXiv preprint arXiv:2304.10573 (2023).
  20. A unified solution to constrained bidding in online display advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2993–3001.
  21. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  22. Jonathan Ho and Tim Salimans. 2021. Classifier-Free Diffusion Guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.
  23. Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning. arXiv preprint arXiv:2306.04875 (2023).
  24. FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis. In IJCAI International Joint Conference on Artificial Intelligence. IJCAI: International Joint Conferences on Artificial Intelligence Organization, 4157–4163.
  25. Edwin T Jaynes. 1957. Information theory and statistical mechanics. Physical review 106, 4 (1957), 620.
  26. Real-time bidding with multi-agent reinforcement learning in display advertising. In Proceedings of the 27th ACM international conference on information and knowledge management. 2193–2201.
  27. Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015).
  28. Offline Reinforcement Learning with Implicit Q-Learning. In International Conference on Learning Representations.
  29. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.
  30. Juncheng Li and Pingzhong Tang. 2022. Auto-bidding Equilibrium in ROI-Constrained Online Advertising Markets. arXiv preprint arXiv:2210.06107 (2022).
  31. Diga: Guided diffusion model for graph recovery in anti-money laundering. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4404–4413.
  32. Sultan Javed Majeed and Marcus Hutter. 2018. On Q-learning Convergence for Non-Markov Decision Processes.. In IJCAI, Vol. 18. 2546–2552.
  33. Diganta Misra. 2019. Mish: A self regularized non-monotonic activation function. arXiv preprint arXiv:1908.08681 (2019).
  34. Sustainable Online Reinforcement Learning for Auto-bidding. Advances in Neural Information Processing Systems 35 (2022), 2651–2663.
  35. The importance of non-markovianity in maximum state entropy exploration. In International Conference on Machine Learning. PMLR, 16223–16239.
  36. Learning inverse dynamics: a comparison. In European symposium on artificial neural networks.
  37. Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning. PMLR, 8162–8171.
  38. Deep landscape forecasting in multi-slot real-time bidding. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4685–4695.
  39. Zero-shot visual imitation. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2050–2053.
  40. Learning non-Markovian Decision-Making from State-only Sequences. In Thirty-seventh Conference on Neural Information Processing Systems.
  41. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241.
  42. Pascal Vincent. 2011. A connection between score matching and denoising autoencoders. Neural computation 23, 7 (2011), 1661–1674.
  43. Display advertising with real-time bidding (RTB) and behavioural targeting. Foundations and Trends® in Information Retrieval 11, 4-5 (2017), 297–435.
  44. Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning. In The Eleventh International Conference on Learning Representations.
  45. A cooperative-competitive multi-agent framework for auto-bidding in online advertising. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. 1129–1139.
  46. Yuxin Wu and Kaiming He. 2018. Group normalization. In Proceedings of the European conference on computer vision (ECCV). 3–19.
  47. A Personalized Automated Bidding Framework for Fairness-aware Online Advertising. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 5544–5553.
  48. Efficiently Leveraging Multi-level User Intent for Session-based Recommendation via Atten-Mixer Network. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining (, Singapore, Singapore,) (WSDM ’23). Association for Computing Machinery, New York, NY, USA, 168–176. https://doi.org/10.1145/3539597.3570445
  49. Brian D. Ziebart. 2010. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Ph. D. Dissertation. USA. Advisor(s) Bagnell, J. Andrew. AAI3438449.
Citations (1)

Summary

We haven't generated a summary for this paper yet.