Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tight Memory-Regret Lower Bounds for Streaming Bandits (2306.07903v1)

Published 13 Jun 2023 in cs.LG

Abstract: In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $\Omega \left( (TB){\alpha} K{1-\alpha}\right), \alpha = 2{B} / (2{B+1}-1)$ for any algorithm with a time horizon $T$, number of arms $K$, and number of passes $B$. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known $\Omega(\sqrt{KT})$ lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of $\Omega \left(T{1/(B+1)} \sum_{\Delta_x>0} \frac{\mu*}{\Delta_x}\right)$ for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of $\epsilon$-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of $\tilde{O} \left( (TB){\alpha} K{1 - \alpha}\right)$ using constant arm memory.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons. In Proceedings of the 30th Conference on Learning Theory, pages 39–75, 2017.
  2. Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res., 11:2785–2836, 2010.
  3. Estimating entropy of distributions in constant space. In Advances in Neural Information Processing Systems 32, pages 5163–5174, 2019.
  4. A sharp memory-regret trade-off for multi-pass streaming bandits. In Proceedings of the 35th Conference on Learning Theory, pages 1423–1462, 2022.
  5. Exploration with limited memory: streaming algorithms for coin tossing, noisy comparisons, and multi-armed bandits. In Proceedings of the 52nd Symposium on Theory of Computing, pages 1237–1250, 2020.
  6. Single-pass streaming lower bounds for multi-armed bandits exploration with instance-sensitive sample complexity. In Advances in Neural Information Processing Systems 35, 2022.
  7. Strong memory lower bounds for learning natural models. In Proceedings of the 35th Conference on Learning Theory, pages 4989–5029, 2022.
  8. Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Proceedings of the 48th Symposium on Theory of Computing, pages 1011–1020, 2016.
  9. On the memory complexity of uniformity testing. In Proceedings of the 35th Conference on Learning Theory, pages 3506–3523, 2022.
  10. Regret minimisation in multi-armed bandits using bounded arm memory. In Proceedings of the 34th Conference on Artificial Intelligence, pages 10085–10092, 2020.
  11. Streaming algorithms for high-dimensional robust statistics. In Proceedings of the 39th International Conference on Machine Learning, pages 5061–5117, 2022.
  12. Extractor-based time-space tradeoffs for learning. Manuscript. July, 2017.
  13. Time-space lower bounds for two-pass learning. In Proceedings of the 34th Computational Complexity Conference, pages 22:1–22:39, 2019.
  14. Bounded memory active learning through enriched queries. In Proceedings of the 34th Conference on Learning Theory, pages 2358–2387, 2021.
  15. Optimal streaming algorithms for multi-armed bandits. In Proceedings of the 38th International Conference on Machine Learning, pages 5045–5054, 2021.
  16. Time-space hardness of learning sparse parities. In Proceedings of the 49th Symposium on Theory of Computing, pages 1067–1080, 2017.
  17. Stochastic multi-armed bandits in constant space. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, pages 386–394, 2018.
  18. Multi-armed bandits with bounded arm-memory: near-optimal guarantees for best-arm identification and regret minimization. In Advances in Neural Information Processing Systems 34, pages 19553–19565, 2021.
  19. Efficient convex optimization requires superlinear memory. In Proceedings of the 35th Conference on Learning Theory, pages 2390–2430, 2022.
  20. Online prediction in sub-linear space. In Proceedings of the 34th Symposium on Discrete Algorithms, 2023.
  21. Santanu Rathod. On reducing the order of arm-passes bandit streaming algorithms under memory bottleneck. CoRR, abs/2112.06130, 2021.
  22. Ran Raz. A time-space lower bound for a large class of learning problems. In Proceedings of the 58th Symposium on Foundations of Computer Science, pages 732–742, 2017.
  23. Ran Raz. Fast learning requires good memory: A time-space lower bound for parity learning. J. ACM, 66(1):3:1–3:18, 2019.
  24. Memory-sample tradeoffs for linear regression with small error. In Proceedings of the 51st Symposium on Theory of Computing, pages 890–901, 2019.
  25. Memory, communication, and statistical queries. In Proceedings of the 29th Conference on Learning Theory, pages 1490–1516, 2016.
  26. Memory bounds for the experts problem. In Proceedings of the 54th Symposium on Theory of Computing, pages 1158–1171, 2022.
  27. Chen Wang. Tight regret bounds for single-pass streaming multi-armed bandits. CoRR, abs/2306.02208, 2023.
  28. Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In Proceedings of the 18th Symposium on Foundations of Computer Science, pages 222–227, 1977.
  29. Information-theoretic lower bounds for distributed statistical estimation with communication constraints. In Advances in Neural Information Processing Systems 26, pages 2328–2336, 2013.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shaoang Li (4 papers)
  2. Lan Zhang (108 papers)
  3. Junhao Wang (21 papers)
  4. Xiang-Yang Li (77 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.