Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Optimal cross-learning for contextual bandits with unknown context distributions (2401.01857v1)

Published 3 Jan 2024 in cs.LG and stat.ML

Abstract: We consider the problem of designing contextual bandit algorithms in the ``cross-learning'' setting of Balseiro et al., where the learner observes the loss for the action they play in all possible contexts, not just the context of the current round. We specifically consider the setting where losses are chosen adversarially and contexts are sampled i.i.d. from an unknown distribution. In this setting, we resolve an open problem of Balseiro et al. by providing an efficient algorithm with a nearly tight (up to logarithmic factors) regret bound of $\widetilde{O}(\sqrt{TK})$, independent of the number of contexts. As a consequence, we obtain the first nearly tight regret bounds for the problems of learning to bid in first-price auctions (under unknown value distributions) and sleeping bandits with a stochastic action set. At the core of our algorithm is a novel technique for coordinating the execution of a learning algorithm over multiple epochs in such a way to remove correlations between estimation of the unknown distribution and the actions played by the algorithm. This technique may be of independent interest for other learning problems involving estimation of an unknown context distribution.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1), 2002.
  2. Learning to bid in contextual first price auctions. In Proceedings of the ACM Web Conference 2023, WWW ’23, page 3489–3497, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9781450394161. doi: 10.1145/3543507.3583427. URL https://doi.org/10.1145/3543507.3583427.
  3. Contextual bandits with cross-learning. Advances in Neural Information Processing Systems, 32, 2019.
  4. Yuyu Chen. Programmatic advertising is preparing for the first-price auction era. https://digiday.com/marketing/ programmatic- advertising-readying-first-price-auction-era, October 2017. Accessed: 2020-01-29.
  5. Learning to bid optimally and efficiently in adversarial first-price auctions. arXiv preprint arXiv:2007.04568, 2020a.
  6. Optimal no-regret learning in repeated first-price auctions. arXiv preprint arXiv:2003.09795, 2020b.
  7. Elad Hazan et al. Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4):157–325, 2016.
  8. Hardness of online sleeping combinatorial optimization problems. Advances in Neural Information Processing Systems, 29, 2016.
  9. Learning hurdles for sleeping experts. ACM Transactions on Computation Theory (TOCT), 6(3):1–16, 2014.
  10. Sleeping experts and bandits with stochastic action availability and adversarial rewards. In Artificial Intelligence and Statistics, pages 272–279. PMLR, 2009.
  11. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., pages 594–605. IEEE, 2003.
  12. Regret bounds for sleeping experts and bandits. Machine learning, 80(2-3):245–272, 2010.
  13. Bandit algorithms. Cambridge University Press, 2020.
  14. Efficient and robust algorithms for adversarial linear contextual bandits. In Conference on Learning Theory, pages 3049–3068. PMLR, 2020.
  15. Online combinatorial optimization with stochastic decision sets and adversarial losses. Advances in Neural Information Processing Systems, 27, 2014.
  16. Improved sleeping bandits with stochastic action sets and adversarial rewards. In International Conference on Machine Learning, pages 8357–8366. PMLR, 2020.
  17. Meow: A space-efficient nonparametric bid shading algorithm. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 3928–3936, 2021.
  18. Leveraging the hints: Adaptive bidding in repeated first-price auctions. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 21329–21341. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/86419aba4e5eafd2b1009a2e3c540bb0-Paper-Conference.pdf.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Jon Schneider (50 papers)
  2. Julian Zimmert (30 papers)
Citations (3)