Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Ad-load Balancing via Off-policy Learning in a Content Marketplace (2309.11518v2)

Published 19 Sep 2023 in cs.IR and cs.LG

Abstract: Ad-load balancing is a critical challenge in online advertising systems, particularly in the context of social media platforms, where the goal is to maximize user engagement and revenue while maintaining a satisfactory user experience. This requires the optimization of conflicting objectives, such as user satisfaction and ads revenue. Traditional approaches to ad-load balancing rely on static allocation policies, which fail to adapt to changing user preferences and contextual factors. In this paper, we present an approach that leverages off-policy learning and evaluation from logged bandit feedback. We start by presenting a motivating analysis of the ad-load balancing problem, highlighting the conflicting objectives between user satisfaction and ads revenue. We emphasize the nuances that arise due to user heterogeneity and the dependence on the user's position within a session. Based on this analysis, we define the problem as determining the optimal ad-load for a particular feed fetch. To tackle this problem, we propose an off-policy learning framework that leverages unbiased estimators such as Inverse Propensity Scoring (IPS) and Doubly Robust (DR) to learn and estimate the policy values using offline collected stochastic data. We present insights from online A/B experiments deployed at scale across over 80 million users generating over 200 million sessions, where we find statistically significant improvements in both user satisfaction metrics and ads revenue for the platform.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Multistakeholder recommendation: Survey and research directions. User Modeling and User-Adapted Interaction 30, 1 (01 Mar 2020), 127–158. https://doi.org/10.1007/s11257-019-09256-1
  2. Himan Abdollahpouri and Robin Burke. 2022. Multistakeholder Recommender Systems. Springer US, New York, NY, 647–677. https://doi.org/10.1007/978-1-0716-2197-4_17
  3. Zoë Abrams and Erik Vee. 2007. Personalized Ad Delivery When Ads Fatigue: An Approximation Algorithm. In Internet and Network Economics, Xiaotie Deng and Fan Chung Graham (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 535–540.
  4. AdKDD 2022. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22). ACM, 4852–4853. https://doi.org/10.1145/3534678.3542920
  5. Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising. Journal of Machine Learning Research 14, 101 (2013), 3207–3260. http://jmlr.org/papers/v14/bottou13a.html
  6. Multi-objective Bandits: Optimizing the Generalized Gini Index. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 625–634. https://proceedings.mlr.press/v70/busa-fekete17a.html
  7. Real-Time Bidding by Reinforcement Learning in Display Advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM ’17). ACM, 661–670. https://doi.org/10.1145/3018661.3018702
  8. Top-K Off-Policy Correction for a REINFORCE Recommender System. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19). ACM, 456–464. https://doi.org/10.1145/3289600.3290999
  9. Click Models for Web Search. Morgan & Claypool. https://doi.org/10.2200/S00654ED1V01Y201507ICR043
  10. Doubly Robust Policy Evaluation and Learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML’11). Omnipress, 1097–1104.
  11. More Robust Doubly Robust Off-policy Evaluation. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 1447–1456. https://proceedings.mlr.press/v80/farajtabar18a.html
  12. Distributionally Robust Counterfactual Risk Minimization. In Proc. of the 34th AAAI Conference on Artificial Intelligence (AAAI’20). AAAI Press.
  13. Practical Lessons from Predicting Clicks on Ads at Facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising (ADKDD’14). ACM, 1–9. https://doi.org/10.1145/2648584.2648589
  14. Measuring consumer sensitivity to audio advertising: A field experiment on pandora internet radio. Available at SSRN 3166676 (2018).
  15. Edward L. Ionides. 2008. Truncated Importance Sampling. Journal of Computational and Graphical Statistics 17, 2 (2008), 295–311.
  16. Olivier Jeunen. 2023. A Probabilistic Position Bias Model for Short-Video Recommendation Feeds. In Proceedings of the 17th ACM Conference on Recommender Systems (RecSys ’23). ACM, 675–681. https://doi.org/10.1145/3604915.3608777
  17. Olivier Jeunen and Bart Goethals. 2020. An Empirical Evaluation of Doubly Robust Learning for Recommendation. In Proc. of the ACM RecSys Workshop on Bandit Learning from User Interactions (REVEAL ’20).
  18. Olivier Jeunen and Bart Goethals. 2021. Pessimistic Reward Models for Off-Policy Learning in Recommendation. In Proceedings of the 15th ACM Conference on Recommender Systems (RecSys ’21). ACM, 63–74. https://doi.org/10.1145/3460231.3474247
  19. Olivier Jeunen and Bart Goethals. 2023. Pessimistic Decision-Making for Recommender Systems. ACM Trans. Recomm. Syst. 1, 1, Article 4 (feb 2023), 27 pages. https://doi.org/10.1145/3568029
  20. CONSEQUENCES — Causality, Counterfactuals and Sequential Decision-Making for Recommender Systems. In Proceedings of the 16th ACM Conference on Recommender Systems (RecSys ’22). ACM, 654–657. https://doi.org/10.1145/3523227.3547409
  21. Olivier Jeunen and Ben London. 2023. Offline Recommender System Evaluation under Unobserved Confounding. In RecSys 2023 Workshop: CONSEQUENCES – Causality, Counterfactuals and Sequential Decision-Making. arXiv:2309.04222
  22. Learning to Bid with AuctionGym. In Proceedings of the Workshop on Knowledge Discovery and Data Mining for Online Advertising (ADKDD ’22).
  23. Off-Policy Learning-to-Bid with AuctionGym. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23). ACM, 4219–4228. https://doi.org/10.1145/3580305.3599877
  24. Joint Policy-Value Learning for Recommendation. In Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’20). ACM, 1223–1233.
  25. A Probabilistic Framework for Learning Auction Mechanisms via Gradient Descent. In Proceedings of the Workshop on Artificial Intelligence for Online Advertising (AI4WebAds ’23).
  26. Deep Learning with Logged Bandit Feedback. In Proc. of the 6th International Conference on Learning Representations (ICLR ’18).
  27. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. arXiv:2005.01643 [cs.LG]
  28. Counterfactual estimation and optimization of click metrics in search engines: A case study. In Proceedings of the 24th International Conference on World Wide Web. 929–934.
  29. Neural Auction: End-to-End Learning of Auction Mechanisms for E-Commerce Advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD ’21). ACM, 3354–3364. https://doi.org/10.1145/3447548.3467103
  30. Ben London and Thorsten Joachims. 2022. Control variate diagnostics for detecting problems in logged bandit feedback. In RecSys 2022 Workshop: CONSEQUENCES – Causality, Counterfactuals and Sequential Decision-Making.
  31. Boosted Off-Policy Learning. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 206), Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent (Eds.). PMLR, 5614–5640. https://proceedings.mlr.press/v206/london23a.html
  32. Off-Policy Learning in Two-Stage Recommender Systems. In Proceedings of The Web Conference 2020 (WWW ’20). ACM, 463–473. https://doi.org/10.1145/3366423.3380130
  33. Imitation-Regularized Offline Learning. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 89). PMLR, 2956–2965. https://proceedings.mlr.press/v89/ma19b.html
  34. Andreas Maurer and Massimiliano Pontil. 2009. Empirical Bernstein Bounds and Sample Variance Penalization. Stat. 1050 (2009), 21.
  35. Ad Click Prediction: A View from the Trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’13). ACM, 1222–1230. https://doi.org/10.1145/2487575.2488200
  36. Bandit based optimization of multiple objectives on a music streaming platform. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 3224–3233.
  37. Three Methods for Training on Bandit Feedback. In Proc. of the NeurIPS Workshop on Causality and Machine Learning (CausalML ’19).
  38. Art B. Owen. 2013. Monte Carlo theory, methods and examples.
  39. Judea Pearl. 2009. Causality. Cambridge university press.
  40. CatBoost: unbiased boosting with categorical features. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf
  41. Marc Resnick and William Albert. 2014. The Impact of Advertising Location and User Task on the Emergence of Banner Ad Blindness: An Eye-Tracking Study. International Journal of Human–Computer Interaction 30, 3 (2014), 206–219. https://doi.org/10.1080/10447318.2013.847762
  42. Quantifying and Leveraging User Fatigue for Interventions in Recommender Systems. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval.
  43. Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation. arXiv preprint arXiv:2008.07146 (2020).
  44. Yuta Saito and Thorsten Joachims. 2022. Counterfactual Evaluation and Learning for Interactive Systems: Foundations, Implementations, and Recent Advances. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22). ACM, 4824–4825. https://doi.org/10.1145/3534678.3542601
  45. Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 8884–8894. https://proceedings.mlr.press/v119/si20a.html
  46. Doubly robust off-policy evaluation with shrinkage. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 9167–9176. https://proceedings.mlr.press/v119/su20a.html
  47. Cab: Continuous adaptive blending for policy evaluation and learning. In International Conference on Machine Learning. PMLR, 6005–6014.
  48. Adith Swaminathan and Thorsten Joachims. 2015a. Batch learning from logged bandit feedback through counterfactual risk minimization. The Journal of Machine Learning Research 16, 1 (2015), 1731–1755.
  49. Adith Swaminathan and Thorsten Joachims. 2015b. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback. In Proc. of the 32nd International Conference on International Conference on Machine Learning (ICML’15). JMLR.org, 814–823.
  50. Adith Swaminathan and Thorsten Joachims. 2015c. The self-normalized estimator for counterfactual learning. advances in neural information processing systems 28 (2015).
  51. Adith Swaminathan and Thorsten Joachims. 2015d. The Self-Normalized Estimator for Counterfactual Learning. In Advances in Neural Information Processing Systems. 3231–3239.
  52. Practical Bandits: An Industry Perspective. arXiv:2302.01223 [cs.LG]
  53. Yong Zheng and David (Xuejun) Wang. 2022. A survey of recommender systems with multi-objective optimization. Neurocomputing 474 (2022), 141–153. https://doi.org/10.1016/j.neucom.2021.11.041
Citations (4)

Summary

We haven't generated a summary for this paper yet.