Tight Memory-Regret Lower Bounds for Streaming Bandits (2306.07903v1)
Abstract: In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $\Omega \left( (TB){\alpha} K{1-\alpha}\right), \alpha = 2{B} / (2{B+1}-1)$ for any algorithm with a time horizon $T$, number of arms $K$, and number of passes $B$. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known $\Omega(\sqrt{KT})$ lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of $\Omega \left(T{1/(B+1)} \sum_{\Delta_x>0} \frac{\mu*}{\Delta_x}\right)$ for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of $\epsilon$-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of $\tilde{O} \left( (TB){\alpha} K{1 - \alpha}\right)$ using constant arm memory.
- Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons. In Proceedings of the 30th Conference on Learning Theory, pages 39–75, 2017.
- Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res., 11:2785–2836, 2010.
- Estimating entropy of distributions in constant space. In Advances in Neural Information Processing Systems 32, pages 5163–5174, 2019.
- A sharp memory-regret trade-off for multi-pass streaming bandits. In Proceedings of the 35th Conference on Learning Theory, pages 1423–1462, 2022.
- Exploration with limited memory: streaming algorithms for coin tossing, noisy comparisons, and multi-armed bandits. In Proceedings of the 52nd Symposium on Theory of Computing, pages 1237–1250, 2020.
- Single-pass streaming lower bounds for multi-armed bandits exploration with instance-sensitive sample complexity. In Advances in Neural Information Processing Systems 35, 2022.
- Strong memory lower bounds for learning natural models. In Proceedings of the 35th Conference on Learning Theory, pages 4989–5029, 2022.
- Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Proceedings of the 48th Symposium on Theory of Computing, pages 1011–1020, 2016.
- On the memory complexity of uniformity testing. In Proceedings of the 35th Conference on Learning Theory, pages 3506–3523, 2022.
- Regret minimisation in multi-armed bandits using bounded arm memory. In Proceedings of the 34th Conference on Artificial Intelligence, pages 10085–10092, 2020.
- Streaming algorithms for high-dimensional robust statistics. In Proceedings of the 39th International Conference on Machine Learning, pages 5061–5117, 2022.
- Extractor-based time-space tradeoffs for learning. Manuscript. July, 2017.
- Time-space lower bounds for two-pass learning. In Proceedings of the 34th Computational Complexity Conference, pages 22:1–22:39, 2019.
- Bounded memory active learning through enriched queries. In Proceedings of the 34th Conference on Learning Theory, pages 2358–2387, 2021.
- Optimal streaming algorithms for multi-armed bandits. In Proceedings of the 38th International Conference on Machine Learning, pages 5045–5054, 2021.
- Time-space hardness of learning sparse parities. In Proceedings of the 49th Symposium on Theory of Computing, pages 1067–1080, 2017.
- Stochastic multi-armed bandits in constant space. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, pages 386–394, 2018.
- Multi-armed bandits with bounded arm-memory: near-optimal guarantees for best-arm identification and regret minimization. In Advances in Neural Information Processing Systems 34, pages 19553–19565, 2021.
- Efficient convex optimization requires superlinear memory. In Proceedings of the 35th Conference on Learning Theory, pages 2390–2430, 2022.
- Online prediction in sub-linear space. In Proceedings of the 34th Symposium on Discrete Algorithms, 2023.
- Santanu Rathod. On reducing the order of arm-passes bandit streaming algorithms under memory bottleneck. CoRR, abs/2112.06130, 2021.
- Ran Raz. A time-space lower bound for a large class of learning problems. In Proceedings of the 58th Symposium on Foundations of Computer Science, pages 732–742, 2017.
- Ran Raz. Fast learning requires good memory: A time-space lower bound for parity learning. J. ACM, 66(1):3:1–3:18, 2019.
- Memory-sample tradeoffs for linear regression with small error. In Proceedings of the 51st Symposium on Theory of Computing, pages 890–901, 2019.
- Memory, communication, and statistical queries. In Proceedings of the 29th Conference on Learning Theory, pages 1490–1516, 2016.
- Memory bounds for the experts problem. In Proceedings of the 54th Symposium on Theory of Computing, pages 1158–1171, 2022.
- Chen Wang. Tight regret bounds for single-pass streaming multi-armed bandits. CoRR, abs/2306.02208, 2023.
- Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In Proceedings of the 18th Symposium on Foundations of Computer Science, pages 222–227, 1977.
- Information-theoretic lower bounds for distributed statistical estimation with communication constraints. In Advances in Neural Information Processing Systems 26, pages 2328–2336, 2013.
- Shaoang Li (4 papers)
- Lan Zhang (108 papers)
- Junhao Wang (21 papers)
- Xiang-Yang Li (77 papers)