Soft Frequency Capping for Improved Ad Click Prediction in Yahoo Gemini Native (2312.05052v1)
Abstract: Yahoo's native advertising (also known as Gemini native) serves billions of ad impressions daily, reaching a yearly run-rate of many hundred of millions USD. Driving the Gemini native models that are used to predict both click probability (pCTR) and conversion probability (pCONV) is OFFSET - a feature enhanced collaborative-filtering (CF) based event prediction algorithm. \offset is a one-pass algorithm that updates its model for every new batch of logged data using a stochastic gradient descent (SGD) based approach. Since OFFSET represents its users by their features (i.e., user-less model) due to sparsity issues, rule based hard frequency capping (HFC) is used to control the number of times a certain user views a certain ad. Moreover, related statistics reveal that user ad fatigue results in a dramatic drop in click through rate (CTR). Therefore, to improve click prediction accuracy, we propose a soft frequency capping (SFC) approach, where the frequency feature is incorporated into the OFFSET model as a user-ad feature and its weight vector is learned via logistic regression as part of OFFSET training. Online evaluation of the soft frequency capping algorithm via bucket testing showed a significant 7.3% revenue lift. Since then, the frequency feature enhanced model has been pushed to production serving all traffic, and is generating a hefty revenue lift for Yahoo Gemini native. We also report related statistics that reveal, among other things, that while users' gender does not affect ad fatigue, the latter seems to increase with users' age.
- Spatio-temporal models for estimating click-through rate. In Proceedings of the 18th international conference on World wide web. ACM, 21–30.
- OFF-set: one-pass factorization of feature sets for online recommendation in persistent cold start settings. In Proc. RecSys’2013. 375–378.
- OFF-set: one-pass factorization of features sets for online recommendation in persistent cold start settings. Proc. RecSys (2013).
- Adaptive Online Hyper-Parameters Tuning for Ad Event-Prediction Models. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 672–679.
- Build your own music recommender by modeling internet radio streams. In Proceedings of the 21st international conference on World Wide Web. ACM, 1–10.
- Robert M Bell and Yehuda Koren. 2007. Lessons from the Netflix prize challenge. Acm Sigkdd Explorations Newsletter 9, 2 (2007), 75–79.
- Frequency capping in online advertising. Journal of Scheduling 17, 4 (2014), 385–398.
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107–113.
- Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research (2011), 2121–2159.
- Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. The American economic review 97, 1 (2007), 242–259.
- Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters 27, 8 (2006), 861–874.
- Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 173–182.
- Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising. ACM, 1–9.
- Matrix factorization techniques for recommender systems. Computer 8 (2009), 30–37.
- Modeling impression discounting in large-scale recommender systems. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1837–1846.
- Click-through prediction for advertising in twitter timeline. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1959–1968.
- User fatigue in online news recommendation. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1363–1372.
- Brendan McMahan. 2011. Follow-the-regularized-leader and mirror descent: Equivalence theorems and l1 regularization. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. 525–533.
- Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1222–1230.