Online Learning with Sublinear Best-Action Queries (2407.16355v1)
Abstract: In online learning, a decision maker repeatedly selects one of a set of actions, with the goal of minimizing the overall loss incurred. Following the recent line of research on algorithms endowed with additional predictive features, we revisit this problem by allowing the decision maker to acquire additional information on the actions to be selected. In particular, we study the power of \emph{best-action queries}, which reveal beforehand the identity of the best action at a given time step. In practice, predictive features may be expensive, so we allow the decision maker to issue at most $k$ such queries. We establish tight bounds on the performance any algorithm can achieve when given access to $k$ best-action queries for different types of feedback models. In particular, we prove that in the full feedback model, $k$ queries are enough to achieve an optimal regret of $\Theta\left(\min\left{\sqrt T, \frac Tk\right}\right)$. This finding highlights the significant multiplicative advantage in the regret rate achievable with even a modest (sublinear) number $k \in \Omega(\sqrt{T})$ of queries. Additionally, we study the challenging setting in which the only available feedback is obtained during the time steps corresponding to the $k$ best-action queries. There, we provide a tight regret rate of $\Theta\left(\min\left{\frac{T}{\sqrt k},\frac{T2}{k2}\right}\right)$, which improves over the standard $\Theta\left(\frac{T}{\sqrt k}\right)$ regret rate for label efficient prediction for $k \in \Omega(T{2/3})$.
- Improved frequency estimation algorithms with and without predictions. CoRR, abs/2312.07535, 2023.
- Online graph coloring with predictions. CoRR, abs/2312.00601, 2023a.
- Online metric algorithms with untrusted predictions. ACM Trans. Algorithms, 19(2):19:1–19:34, 2023b.
- Mixing predictions for online metric algorithms. In ICML, volume 202 of Proceedings of Machine Learning Research, pages 969–983. PMLR, 2023c.
- The nonstochastic multiarmed bandit problem. 32(1):48–77, 2003.
- X. Bai and C. Coester. Sorting with predictions. In NeurIPS, 2023.
- A. Bhaskara and K. Munagala. Competing against adaptive strategies in online learning via hints. In AISTATS, volume 206 of Proceedings of Machine Learning Research, pages 10409–10424. PMLR, 2023.
- Online learning with imperfect hints. In ICML, volume 119 of Proceedings of Machine Learning Research, pages 822–831. PMLR, 2020.
- Power of hints for online learning with movement costs. In AISTATS, volume 130 of Proceedings of Machine Learning Research, pages 2818–2826. PMLR, 2021a.
- Logarithmic regret from sublinear hints. In NeurIPS, pages 28222–28232, 2021b.
- Bandit online linear optimization with hints and queries. In ICML, volume 202 of Proceedings of Machine Learning Research, pages 2313–2336. PMLR, 2023a.
- Online learning and bandits with queried hints. In ITCS, volume 251 of LIPIcs, pages 16:1–16:24. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023b.
- N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.
- Understanding the role of feedback in online learning with switching costs. In ICML, volume 202 of Proceedings of Machine Learning Research, pages 5521–5543. PMLR, 2023.
- Online learning with a hint. In NIPS, pages 5299–5308, 2017.
- S. Gollapudi and D. Panigrahi. Online algorithms for rent-or-buy with expert advice. In ICML, volume 97 of Proceedings of Machine Learning Research, pages 2319–2327. PMLR, 2019.
- Google. Google Support: how automation is used in content moderation. https://support.google.com/adspolicy/answer/13584894. Accessed: 2024-05-22.
- Online algorithms for weighted paging with predictions. ACM Trans. Algorithms, 18(4):39:1–39:27, 2022.
- Online scheduling via learned weights. In SODA, pages 1859–1877. SIAM, 2020.
- T. Lattimore and C. Szepesvári. Bandit Algorithms. Cambridge University Press, 2020.
- T. Lykouris and S. Vassilvitskii. Competitive caching with machine learned advice. J. ACM, 68(4):24:1–24:25, 2021.
- Meta. Meta Transparency Center: how we label violations. https://transparency.fb.com/en-gb/policies/improving/content-actioned-metric/, a. Accessed: 2024-05-22.
- Meta. Meta Transparency Center: prioritizing content review. https://transparency.fb.com/en-gb/policies/improving/prioritizing-content-review/, b. Accessed: 2024-05-22.
- M. Mitzenmacher and S. Vassilvitskii. Algorithms with predictions. In Beyond the Worst-Case Analysis of Algorithms, pages 646–662. Cambridge University Press, 2020.
- Batched bandit problems. In COLT, volume 40 of JMLR Workshop and Conference Proceedings, page 1456. JMLR.org, 2015.
- Improving online algorithms via ML predictions. In NeurIPS, pages 9684–9693, 2018.
- Power-of-2-arms for bandit learning with switching costs. In MobiHoc, pages 131–140. ACM, 2022.
- A. Slivkins. Introduction to multi-armed bandits. Found. Trends Mach. Learn., 12(1-2):1–286, 2019.
- Matteo Russo (11 papers)
- Andrea Celli (39 papers)
- Riccardo Colini Baldeschi (2 papers)
- Federico Fusco (29 papers)
- Daniel Haimovich (6 papers)
- Dima Karamshuk (5 papers)
- Stefano Leonardi (47 papers)
- Niek Tax (27 papers)