Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Online Learning in Contextual Second-Price Pay-Per-Click Auctions (2310.05047v1)

Published 8 Oct 2023 in cs.LG

Abstract: We study online learning in contextual pay-per-click auctions where at each of the $T$ rounds, the learner receives some context along with a set of ads and needs to make an estimate on their click-through rate (CTR) in order to run a second-price pay-per-click auction. The learner's goal is to minimize her regret, defined as the gap between her total revenue and that of an oracle strategy that always makes perfect CTR predictions. We first show that $\sqrt{T}$-regret is obtainable via a computationally inefficient algorithm and that it is unavoidable since our algorithm is no easier than the classical multi-armed bandit problem. A by-product of our results is a $\sqrt{T}$-regret bound for the simpler non-contextual setting, improving upon a recent work of [Feng et al., 2023] by removing the inverse CTR dependency that could be arbitrarily large. Then, borrowing ideas from recent advances on efficient contextual bandit algorithms, we develop two practically efficient contextual auction algorithms: the first one uses the exponential weight scheme with optimistic square errors and maintains the same $\sqrt{T}$-regret bound, while the second one reduces the problem to online regression via a simple epsilon-greedy strategy, albeit with a worse regret bound. Finally, we conduct experiments on a synthetic dataset to showcase the effectiveness and superior performance of our algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning, pages 1638–1646. PMLR, 2014.
  2. Truthful auctions for pricing search keywords. In Proceedings of the 7th ACM Conference on Electronic Commerce, pages 1–7, 2006.
  3. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
  4. Characterizing truthful multi-armed bandit mechanisms. SIAM Journal on Computing, 43(1):194, 2014.
  5. Truthful mechanisms with implicit payment computation. Journal of the ACM (JACM), 62(2):1–37, 2015.
  6. Robust auction design in the auto-bidding world. Advances in Neural Information Processing Systems, 34:17777–17788, 2021.
  7. Learning in repeated auctions with budgets: Regret minimization and equilibrium. Management Science, 65(9):3952–3968, 2019.
  8. The best of many worlds: Dual mirror descent for online allocation problems. Operations Research, 71(1):101–119, 2023.
  9. Stochastic bandits with side observations on networks. In The 2014 ACM international conference on Measurement and modeling of computer systems, pages 289–300, 2014.
  10. The price of truthfulness for pay-per-click auctions. In Proceedings of the 10th ACM conference on Electronic commerce, pages 99–106, 2009.
  11. Efficient optimal learning for contextual bandits. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pages 169–178, 2011.
  12. Improved online learning algorithms for ctr prediction in ad auctions. In Proceedings of the 40th International Conference on Machine Learning (ICML), 2023.
  13. Beyond ucb: Optimal and efficient contextual bandits with regression oracles. In International Conference on Machine Learning, pages 3199–3210. PMLR, 2020.
  14. Practical contextual bandits with regression oracles. In International Conference on Machine Learning, pages 1539–1548. PMLR, 2018.
  15. Efficient first-order contextual bandits: Prediction, allocation, and triangular discrimination. Advances in Neural Information Processing Systems, 34:18907–18919, 2021.
  16. The statistical complexity of interactive decision making. arXiv preprint arXiv:2112.13487, 2021.
  17. Budget pacing in repeated auctions: Regret and efficiency without convergence. arXiv e-prints, pages arXiv–2205, 2022.
  18. A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 605–622, 2012.
  19. Learning to bid optimally and efficiently in adversarial first-price auctions. arXiv preprint arXiv:2007.04568, 2020.
  20. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169–192, 2007.
  21. Vcg mechanism design with unknown agent values under stochastic bandit feedback. Journal of Machine Learning Research, 24(53):1–45, 2023.
  22. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1):4–22, 1985.
  23. The epoch-greedy algorithm for multi-armed bandits with side information. Advances in neural information processing systems, 20, 2007.
  24. Bandit algorithms. Cambridge University Press, 2020.
  25. Autobidders with budget and roi constraints: Efficiency, regret, and pacing dynamics. arXiv preprint arXiv:2301.13306, 2023.
  26. Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability. Mathematics of Operations Research, 47(3):1904–1931, 2022.
  27. Learning to bid in repeated first-price auctions with budgets. arXiv preprint arXiv:2304.13477, 2023.
  28. Taking a hint: How to leverage loss predictors in contextual bandits? In Conference on Learning Theory, pages 3583–3634. PMLR, 2020.
  29. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688, 2011.
  30. On the robustness of epoch-greedy in multi-agent contextual bandit mechanisms. arXiv preprint arXiv:2307.07675, 2023.
  31. Upper counterfactual confidence bounds: a new optimism principle for contextual bandits. arXiv preprint arXiv:2007.07876, 2020.
  32. Tong Zhang. Feel-good thompson sampling for contextual bandits and reinforcement learning. SIAM Journal on Mathematics of Data Science, 4(2):834–857, 2022.
  33. Contextual bandits with smooth regret: Efficient learning in continuous action spaces. In International Conference on Machine Learning, pages 27574–27590. PMLR, 2022.
  34. Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th international conference on machine learning, pages 928–936, 2003.
Citations (3)

Summary

We haven't generated a summary for this paper yet.