Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HiBid: A Cross-Channel Constrained Bidding System with Budget Allocation by Hierarchical Offline Deep Reinforcement Learning (2312.17503v2)

Published 29 Dec 2023 in cs.LG and cs.GT

Abstract: Online display advertising platforms service numerous advertisers by providing real-time bidding (RTB) for the scale of billions of ad requests every day. The bidding strategy handles ad requests cross multiple channels to maximize the number of clicks under the set financial constraints, i.e., total budget and cost-per-click (CPC), etc. Different from existing works mainly focusing on single channel bidding, we explicitly consider cross-channel constrained bidding with budget allocation. Specifically, we propose a hierarchical offline deep reinforcement learning (DRL) framework called ``HiBid'', consisted of a high-level planner equipped with auxiliary loss for non-competitive budget allocation, and a data augmentation enhanced low-level executor for adaptive bidding strategy in response to allocated budgets. Additionally, a CPC-guided action selection mechanism is introduced to satisfy the cross-channel CPC constraint. Through extensive experiments on both the large-scale log data and online A/B testing, we confirm that HiBid outperforms six baselines in terms of the number of clicks, CPC satisfactory ratio, and return-on-investment (ROI). We also deploy HiBid on Meituan advertising platform to already service tens of thousands of advertisers every day.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Q. Feng, D. He, M. Luo, X. Huang, and K.-K. R. Choo, “Eprice: An efficient and privacy-preserving real-time incentive system for crowdsensing in industrial internet of things,” IEEE Transactions on Computers, pp. 1–15, 2023.
  2. H. Zhang, H. Jiang, B. Li, F. Liu, A. V. Vasilakos, and J. Liu, “A framework for truthful online auctions in cloud computing with heterogeneous user demands,” IEEE Transactions on Computers, vol. 65, no. 3, pp. 805–818, 2016.
  3. Z. Zheng, Y. Gui, F. Wu, and G. Chen, “Star: Strategy-proof double auctions for multi-cloud, multi-tenant bandwidth reservation,” IEEE Transactions on Computers, vol. 64, no. 7, pp. 2071–2083, 2015.
  4. Y. Chen, P. Berkhin, B. Anderson, and N. R. Devanur, “Real-time bidding algorithms for performance-based display ad allocation,” in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011, p. 1307–1315.
  5. H. R. Varian, “Online ad auctions,” American Economic Review, vol. 99, no. 2, pp. 430–34, May 2009.
  6. H. Zhu, J. Jin, C. Tan, F. Pan, Y. Zeng, H. Li, and K. Gai, “Optimized cost per click in taobao display advertising,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, p. 2191–2200.
  7. H. Cai, K. Ren, W. Zhang, K. Malialis, J. Wang, Y. Yu, and D. Guo, “Real-time bidding by reinforcement learning in display advertising,” in Proceedings of the tenth ACM international conference on web search and data mining, 2017, p. 661–670.
  8. H. Wang, C. Du, P. Fang, S. Yuan, X. He, L. Wang, and B. Zheng, “Roi-constrained bidding via curriculum-guided bayesian reinforcement learning,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, p. 4021–4031.
  9. S. Xiao, L. Guo, Z. Jiang, L. Lv, Y. Chen, J. Zhu, and S. Yang, “Model-based constrained mdp for budget allocation in sequential incentive marketing,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019, p. 971–980.
  10. N. Alon, I. Gamzu, and M. Tennenholtz, “Optimizing budget allocation among channels and influencers,” in Proceedings of the 21st international conference on World Wide Web, 2012, p. 381–388.
  11. A. Ajay, A. Kumar, P. Agrawal, S. Levine, and O. Nachum, “Opal: Offline primitive discovery for accelerating offline reinforcement learning,” in International Conference on Learning Representations, 2021.
  12. J. Gehring, G. Synnaeve, A. Krause, and N. Usunier, “Hierarchical skills for efficient exploration,” Advances in Neural Information Processing Systems, vol. 34, pp. 11 553–11 564, 2021.
  13. J. Lyu, X. Ma, X. Li, and Z. Lu, “Mildly conservative q-learning for offline reinforcement learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 1711–1724, 2022.
  14. G. Liao, Z. Wang, X. Wu, X. Shi, C. Zhang, Y. Wang, X. Wang, and D. Wang, “Cross dqn: Cross deep q network for ads allocation in feed,” in Proceedings of the ACM Web Conference, 2022, p. 401–409.
  15. Y. Zhang, B. Tang, Q. Yang, D. An, H. Tang, C. Xi, X. Li, and F. Xiong, “Bcorle (λ𝜆\lambdaitalic_λ): An offline reinforcement learning and evaluation framework for coupons allocation in e-commerce market,” Advances in Neural Information Processing Systems, vol. 34, pp. 20 410–20 422, 2021.
  16. W. Zhang, S. Yuan, and J. Wang, “Optimal real-time bidding for display advertising,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 1077–1086.
  17. T. Zhou, H. He, S. Pan, N. Karlsson, B. Shetty, B. Kitts, D. Gligorijevic, S. Gultekin, T. Mao, J. Pan, J. Zhang, and A. Flores, “An efficient deep distribution network for bid shading in first-price auctions,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021, p. 3996–4004.
  18. W. Zhang, B. Kitts, Y. Han, Z. Zhou, T. Mao, H. He, S. Pan, A. Flores, S. Gultekin, and T. Weissman, “Meow: A space-efficient nonparametric bid shading algorithm,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021, p. 3928–3936.
  19. K. Ren, W. Zhang, K. Chang, Y. Rong, Y. Yu, and J. Wang, “Bidding machine: Learning to bid for directly optimizing profits in display advertising,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 4, pp. 645–659, 2018.
  20. D. Wu, X. Chen, X. Yang, H. Wang, Q. Tan, X. Zhang, J. Xu, and K. Gai, “Budget constrained bidding by model-free reinforcement learning in display advertising,” in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, p. 1443–1451.
  21. X. Yang, Y. Li, H. Wang, D. Wu, Q. Tan, J. Xu, and K. Gai, “Bid optimization by multivariable control in display advertising,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining, 2019, p. 1966–1974.
  22. R. Gao, H. Xia, J. Li, D. Liu, S. Chen, and G. Chun, “Drcgr: Deep reinforcement learning framework incorporating cnn and gan-based for interactive recommendation,” in IEEE International Conference on Data Mining, 2019, pp. 1048–1053.
  23. R. Xie, S. Zhang, R. Wang et al., “Hierarchical reinforcement learning for integrated recommendation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 5, 2021, pp. 4521–4528.
  24. C.-C. Lin, K.-T. Chuang, W. C.-H. Wu, and M.-S. Chen, “Combining powers of two predictors in optimizing real-time bidding strategy under constrained budget,” in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2016, p. 2143–2148.
  25. Y. He, X. Chen, D. Wu, J. Pan, Q. Tan, C. Yu, J. Xu, and X. Zhu, “A unified solution to constrained bidding in online display advertising,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021, p. 2993–3001.
  26. J. Zhao, G. Qiu, Z. Guan, W. Zhao, and X. He, “Deep reinforcement learning for sponsored search real-time bidding,” in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, 2018, pp. 1021–1030.
  27. A. Kumar, J. Fu, M. Soh, G. Tucker, and S. Levine, “Stabilizing off-policy q-learning via bootstrapping error reduction,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  28. N. Jaques, A. Ghandeharioun, J. H. Shen, C. Ferguson, A. Lapedriza, N. Jones, S. Gu, and R. Picard, “Way off-policy batch deep reinforcement learning of implicit human preferences in dialog,” arXiv preprint arXiv:1907.00456, 2019.
  29. S. Fujimoto, D. Meger, and D. Precup, “Off-policy deep reinforcement learning without exploration,” in International conference on machine learning, 2019, pp. 2052–2062.
  30. J. García, Fern, and o Fernández, “A comprehensive survey on safe reinforcement learning,” Journal of Machine Learning Research, vol. 16, no. 42, pp. 1437–1480, 2015.
  31. D. Ding, K. Zhang, T. Basar, and M. Jovanovic, “Natural policy gradient primal-dual method for constrained markov decision processes,” Advances in Neural Information Processing Systems, vol. 33, pp. 8378–8390, 2020.
  32. H. Le, C. Voloshin, and Y. Yue, “Batch policy learning under constraints,” in International Conference on Machine Learning, 2019, pp. 3703–3712.
  33. C. Tessler, D. J. Mankowitz, and S. Mannor, “Reward constrained policy optimization,” in International Conference on Learning Representations, 2019.
  34. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
  35. K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” Advances in neural information processing systems, vol. 28, 2015.
  36. S. Paternain, L. Chamon, M. Calvo-Fullana, and A. Ribeiro, “Constrained reinforcement learning has zero duality gap,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  37. Y. Ran, Y.-C. Li, F. Zhang, Z. Zhang, and Y. Yu, “Policy regularization with dataset constraint for offline reinforcement learning,” arXiv preprint arXiv:2306.06569, 2023.
  38. P.-T. De Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A tutorial on the cross-entropy method,” Annals of operations research, vol. 134, no. 1, pp. 19–67, 2005.
  39. K. H. Ang, G. Chong, and Y. Li, “Pid control system analysis, design, and technology,” IEEE Transactions on Control Systems Technology, vol. 13, no. 4, pp. 559–576, 2005.
  40. MindOpt, “Mindopt studio,” 2022. [Online]. Available: https://opt.aliyun.com
  41. L. Mashayekhy, M. M. Nejad, D. Grosu, and A. V. Vasilakos, “An online mechanism for resource allocation and pricing in clouds,” IEEE Transactions on Computers, vol. 65, no. 4, pp. 1172–1184, 2016.
  42. X. Zhu, C. Chen, L. T. Yang, and Y. Xiang, “Angel: Agent-based scheduling for real-time tasks in virtualized clouds,” IEEE Transactions on Computers, vol. 64, no. 12, pp. 3389–3403, 2015.
Citations (1)

Summary

We haven't generated a summary for this paper yet.