2000 character limit reached
An Information-Theoretic Analysis of Nonstationary Bandit Learning (2302.04452v2)
Published 9 Feb 2023 in cs.LG, cs.IT, and math.IT
Abstract: In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes expected reward under the environment state. We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance. We bound limiting per-period regret in terms of the entropy rate of the optimal action process. The bound applies to a wide array of problems studied in the literature and reflects the problem's information structure through its information-ratio.
- A new look at dynamic regret for non-stationary stochastic bandits. arXiv preprint arXiv:2201.06532, 2022.
- Regret in online combinatorial optimization. Mathematics of Operations Research, 39(1):31–45, 2014.
- Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397–422, 2002.
- The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.
- Adaptively tracking the best bandit arm with an unknown number of distribution changes. In Conference on Learning Theory, pages 138–158. PMLR, 2019.
- Empirical Bayes estimation of treatment effects with many A/B tests: An overview. In AEA Papers and Proceedings, volume 109, pages 43–47, 2019.
- Zero-crossing rates of functions of gaussian processes. IEEE Transactions on Information Theory, 37(4):1188–1194, 1991.
- Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in neural information processing systems 27, 2014.
- Non-stationary stochastic optimization. Operations Research, 63(5):1227–1244, 2015.
- Richard E. Blahut. Computation of channel capacity and rate-distortion functions. IEEE Transactions on Information Theory, 18(4):460–473, 1972.
- Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 5(1):1–122, 2012.
- Multi-scale exploration of convex functions and bandit convex optimization. In Conference on Learning Theory, pages 583–589. PMLR, 2016.
- Mortal multi-armed bandits. In Advances in neural information processing systems 21, 2008.
- An empirical evaluation of thompson sampling. In Advances in neural information processing systems 24, 2011.
- Non-stationary bandits with auto-regressive temporal dependency. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- A new algorithm for non-stationary contextual bandits: Efficient, optimal, and parameter-free. In Conference on Learning Theory, pages 696–726. PMLR, 2019.
- Learning to optimize under non-stationarity. In International Conference on Artificial Intelligence and Statistics, pages 1079–1087. PMLR, 2019.
- Thomas Cover. Universal portfolios. Mathematical Finance, 1(1):1–29, 1991.
- Elements of information theory. John Wiley & Sons, 2006.
- Improved upper bounds to the causal quadratic rate-distortion function for gaussian stationary sources. IEEE Transactions on Information Theory, 58(5):3131–3152, 2012.
- On the performance of Thompson sampling on logistic bandits. In Conference on Learning Theory, pages 1158–1160, 2019.
- On the complexity of adversarial decision making. In Advances in Neural Information Processing Systems, 2022.
- On upper-confidence bound policies for switching bandit problems. In International Conference on Algorithmic Learning Theory, pages 174–188. Springer, 2011.
- Gourab Ghatak. A change-detection-based Thompson sampling framework for non-stationary bandits. IEEE Transactions on Computers, 70(10):1670–1676, 2020.
- Robert M. Gray. Rate distortion functions for finite-state finite-alphabet markov sources. IEEE Transactions on Information Theory, 17(2):127–134, 1971.
- Thompson sampling for dynamic multi-armed bandits. In 10th International Conference on Machine Learning and Applications, volume 1, pages 484–489. IEEE, 2011.
- Information directed sampling for sparse linear bandits. In Advances in Neural Information Processing Systems 34, pages 16738–16750, 2021.
- Contextual information-directed sampling. In International Conference on Machine Learning, pages 8446–8464. PMLR, 2022.
- Rate-distortion via markov chain monte carlo. In 2008 IEEE International Symposium on Information Theory, 2008.
- Smooth non-stationary bandits. arXiv preprint arXiv:2301.12366, 2023.
- Regret bounds for Thompson sampling in episodic restless bandit problems. In Advances in Neural Information Processing Systems 32, 2019.
- Discounted UCB. In 2nd PASCAL Challenges Workshop, 2006.
- Tor Lattimore. Improved regret for zeroth-order adversarial bandit convex optimisation. Mathematical Statistics and Learning, 2(3):311–334, 2020.
- Tor Lattimore. Minimax regret for partial monitoring: Infinite outcomes and rustichini’s regret. In Conference on Learning Theory, pages 1547–1575. PMLR, 2022.
- Mirror descent and the information ratio. In Conference on Learning Theory, pages 2965–2992. PMLR, 2021.
- An information-theoretic approach to minimax regret in partial monitoring. In Conference on Learning Theory, pages 2111–2139. PMLR, 2019.
- Bandit algorithms. Cambridge University Press, 2020.
- Information directed sampling for stochastic bandits with graph feedback. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Nonstationary bandit learning via predictive sampling. arXiv preprint arXiv:2205.01970v5, 2023.
- Thompson sampling in switching environments with Bayesian online change point detection. In Artificial Intelligence and Statistics, pages 442–450. PMLR, 2013.
- An information-theoretic analysis of nonstationary bandit learning. In International Conference on Machine Learning, pages 24831–24849, 2023.
- Thompson sampling with information relaxation penalties. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2019.
- Lifting the information ratio: An information-theoretic analysis of Thompson sampling for contextual bandits. arXiv preprint arXiv:2205.13924, 2022.
- Taming non-stationary bandits: A Bayesian approach. arXiv preprint arXiv:1707.09727, 2017.
- Stephen O Rice. Mathematical analysis of random noise. The Bell System Technical Journal, 23(3):282–332, 1944.
- Daniel Russo and Benjamin Van Roy. An information-theoretic analysis of Thompson sampling. Journal of Machine Learning Research, 17(1):1221–1243, 2016.
- Daniel Russo and Benjamin Van Roy. Learning to optimize via information-directed sampling. Operations Research, 66(1):230–252, 2018.
- Daniel Russo and Benjamin Van Roy. Satisficing in time-sensitive bandit learning. Mathematics of Operations Research, 47(4):2815–2839, 2022.
- Herbert A Simon. Rational decision making in business organizations. The American economic review, 69(4):493–513, 1979.
- Adapting to a changing environment: the brownian restless bandits. In COLT, pages 343–354, 2008.
- Zero-delay rate distortion via filtering for vector-valued gaussian sources. IEEE Journal of Selected Topics in Signal Processing, 12(5):841–856, 2018.
- The time-invariant multidimensional gaussian sequential rate-distortion problem revisited. IEEE Transactions on Automatic Control, 65(5), 2020.
- Tracking most significant arm switches in bandits. In Conference on Learning Theory, pages 2160–2182. PMLR, 2022.
- Lossy image compression with compressive autoencoders. In International Conference on Learning Representations (ICLR), 2017.
- William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
- Sliding-window Thompson sampling for non-stationary settings. Journal of Artificial Intelligence Research, 68:311–364, 2020.
- Peter Whittle. Restless bandits: Activity allocation in a changing world. Journal of applied probability, 25:287–298, 1988.
- Non-stationary a/b tests. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2079–2089, 2022.
- Bayesian design principles for frequentist sequential learning. In International Conference on Machine Learning, pages 38768–38800. PMLR, 2023.