RLTP: Reinforcement Learning to Pace for Delayed Impression Modeling in Preloaded Ads (2302.02592v4)
Abstract: To increase brand awareness, many advertisers conclude contracts with advertising platforms to purchase traffic and then deliver advertisements to target audiences. In a whole delivery period, advertisers usually desire a certain impression count for the ads, and they also expect that the delivery performance is as good as possible (e.g., obtaining high click-through rate). Advertising platforms employ pacing algorithms to satisfy the demands via adjusting the selection probabilities to traffic requests in real-time. However, the delivery procedure is also affected by the strategies from publishers, which cannot be controlled by advertising platforms. Preloading is a widely used strategy for many types of ads (e.g., video ads) to make sure that the response time for displaying after a traffic request is legitimate, which results in delayed impression phenomenon. Traditional pacing algorithms cannot handle the preloading nature well because they rely on immediate feedback signals, and may fail to guarantee the demands from advertisers. In this paper, we focus on a new research problem of impression pacing for preloaded ads, and propose a Reinforcement Learning To Pace framework RLTP. It learns a pacing agent that sequentially produces selection probabilities in the whole delivery period. To jointly optimize the two objectives of impression count and delivery performance, RLTP employs tailored reward estimator to satisfy the guaranteed impression count, penalize the over-delivery and maximize the traffic value. Experiments on large-scale industrial datasets verify that RLTP outperforms baseline pacing algorithms by a large margin. We have deployed the RLTP framework online to our advertising platform, and results show that it achieves significant uplift to core metrics including delivery completion rate and click-through rate.
- Optimal delivery of sponsored search advertisements subject to budget constraints. In Proceedings of EC.
- Budget pacing for targeted online advertisements at Linkedin. In Proceedings of KDD.
- Eitan Altman. 1999. Constrained Markov decision processes: stochastic modeling. Routledge.
- Stuart Bennett. 1993. Development of the PID controller. IEEE Control Systems Magazine 13, 6 (1993).
- Online allocation of display ads with smooth delivery. In Proceedings of KDD.
- SHALE: an efficient algorithm for allocation of guaranteed display advertising. In Proceedings of KDD.
- Dynamics of bid optimization in online advertisement auctions. In Proceedings of WWW.
- Olivier Chapelle. 2014. Modeling delayed feedback in display advertising. In Proceedings of KDD.
- Ad serving using a compact allocation plan. In Proceedings of EC.
- An Adaptive Unified Allocation Framework for Guaranteed Display Advertising. In Proceedings of WSDM.
- Deep neural networks for youtube recommendations. In Proceedings of RecSys.
- Large-Scale Personalized Delivery for Guaranteed Display Advertising with Real-Time Pacing. In Proceedings of ICDM.
- Eli Friedman and Fred Fontaine. 2018. Generalizing Across Multi-Objective Reward Functions in Deep Reinforcement Learning. arXiv preprint arXiv:1809.06364 abs/1809.06364 (2018).
- Impression Pacing for Jobs Marketplace at LinkedIn. In Proceedings of CIKM.
- Inductive representation learning on large graphs. Proceedings of NeurIPS 30 (2017).
- Delivering Guaranteed Display Ads under Reach and Frequency Requirements. In Proceedings of AAAI.
- Addressing delayed feedback for continuous training with neural networks in CTR prediction. In Proceedings of RecSys.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
- Dueling network architectures for deep reinforcement learning. In Proceedings of ICML. PMLR.
- Smart pacing for effective online ad campaign optimization. In Proceedings of KDD.
- Capturing delayed feedback in conversion rate prediction via elapsed-time sampling. In Proceedings of AAAI, Vol. 35.
- A feedback shift correction in predicting conversion rates under delayed feedback. In Proceedings of WWW.
- A Request-level Guaranteed Delivery Advertising Planning: Forecasting and Allocation. In Proceedings of KDD.
- Efficient Delivery Policy to Minimize User Traffic Consumption in Guaranteed Advertising. In Proceedings of AAAI.