Online Resource Allocation: Bandits feedback and Advice on Time-varying Demands (2302.04182v2)
Abstract: We consider a general online resource allocation model with bandit feedback and time-varying demands. While online resource allocation has been well studied in the literature, most existing works make the strong assumption that the demand arrival process is stationary. In practical applications, such as online advertisement and revenue management, however, this process may be exogenous and non-stationary, like the constantly changing internet traffic. Motivated by the recent Online Algorithms with Advice framework [Mitazenmacher and Vassilvitskii, \emph{Commun. ACM} 2022], we explore how online advice can inform policy design. We establish an impossibility result that any algorithm perform poorly in terms of regret without any advice in our setting. In contrast, we design an robust online algorithm that leverages the online predictions on the total demand volumes. Empowered with online advice, our proposed algorithm is shown to have both theoretical performance and promising numerical results compared with other algorithms in literature. We also provide two explicit examples for the time-varying demand scenarios and derive corresponding theoretical performance guarantees. Finally, we adapt our model to a network revenue management problem, and numerically demonstrate that our algorithm can still performs competitively compared to existing baselines.
- Bandits with concave rewards and convex knapsacks. In Proceedings of the fifteenth ACM conference on Economics and computation, pages 989–1006, 2014.
- Bandits with global convex constraints and objective. Operations Research, 67(5):1486–1502, 2019.
- An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In Conference on Learning Theory, pages 4–18. PMLR, 2016.
- Online facility location with multiple advice. Advances in Neural Information Processing Systems, 34:4661–4673, 2021.
- Customizing ML predictions for online algorithms. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 303–313. PMLR, 13–18 Jul 2020.
- Secretary and online matching problems with machine learned advice. Advances in Neural Information Processing Systems, 33:7933–7944, 2020.
- A nonparametric framework for online stochastic matching with correlated arrivals. arXiv preprint arXiv:2208.02229, 2022.
- Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19):1876–1902, 2009.
- Yossi Aviv. A time-series framework for supply-chain inventory management. Operations Research, 51(2):210–227, 2003.
- Dynamic pricing with limited supply, 2015.
- Autoregressive bandits. arXiv preprint arXiv:2212.06251, 2022.
- Bandits with knapsacks. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 207–216. IEEE, 2013.
- Resourceful contextual bandits. In Conference on Learning Theory, pages 1109–1134. PMLR, 2014.
- Fluid approximations for revenue management under high-variance demand. SSRN, 2022. URL https://ssrn.com/abstract=4136445.
- Online resource allocation under horizon uncertainty. arXiv preprint arXiv:2206.13606, 2022.
- The primal-dual method for learning augmented algorithms. Advances in Neural Information Processing Systems, 33:20083–20094, 2020.
- Adaptive distributionally robust optimization. Management Science, 65(2):604–618, 2019.
- Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operations Research, 57(6):1407–1420, 2009.
- Blind network revenue management. Operations research, 60(6):1537–1550, 2012.
- Stochastic multi-armed-bandit problem with non-stationary rewards. Advances in neural information processing systems, 27, 2014.
- Non-stationary stochastic optimization. Operations research, 63(5):1227–1244, 2015.
- Learning to optimize under non-stationarity. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1079–1087. PMLR, 2019.
- Concentration inequalities and martingale inequalities: a survey. Internet mathematics, 3(1):79–127, 2006.
- Introduction to Algorithms, 3rd Edition. MIT Press, 2009. ISBN 978-0-262-03384-8.
- Follow the leader if you can, hedge if you must. The Journal of Machine Learning Research, 15(1):1281–1316, 2014.
- Learning online algorithms with distributional advice. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 2687–2696. PMLR, 18–24 Jul 2021.
- Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, 2009.
- Online network revenue management using thompson sampling. Operations research, 66(6):1586–1602, 2018.
- David A Freedman. On tail probabilities for martingales. the Annals of Probability, pages 100–118, 1975.
- R.J. Hyndman and G. Athanasopoulos. Forecasting: principles and practice. OTexts, 2021. URL https://otexts.com/fpp3/.
- Adversarial bandits with knapsacks. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 202–219. IEEE, 2019.
- Online optimization: Competing with dynamic comparators. In Artificial Intelligence and Statistics, pages 398–406. PMLR, 2015.
- Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16, pages 795–811. Springer, 2016.
- Online scheduling via learned weights. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1859–1877. SIAM, 2020.
- Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 379(2194), 2021.
- On the choice-based linear programming model for network revenue management. Manufacturing & Service Operations Management, 10(2):288–310, 2008.
- Non-stationary bandits with knapsacks. arXiv preprint arXiv:2205.12427, 2022.
- Competitive caching with machine learned advice. Journal of the ACM (JACM), 68(4):1–25, 2021.
- Bandits with adversarial scaling. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 6511–6521, 2020.
- Consulting service as a tool to support decision-making by rural producers. In The Challenge of Sustainability in Agricultural Systems: Volume 2, pages 417–424. Springer, 2021.
- Adwords and generalized online matching. Journal of the ACM (JACM), 54(5):22–es, 2007.
- Aranyak Mehta et al. Online matching and ad allocation. Foundations and Trends® in Theoretical Computer Science, 8(4):265–368, 2013.
- Michael Mitzenmacher. Scheduling with predictions and the price of misprediction. arXiv preprint arXiv:1902.00732, 2019.
- Algorithms with predictions. Commun. ACM, 65(7):33–35, jun 2022a.
- Algorithms with predictions. Communications of the ACM, 65(7):33–35, 2022b.
- Management consulting and international business support for smes: need and obstacles. Education+ Training, 46(8/9):424–432, 2004.
- Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2019.
- Scale-free algorithms for online linear optimization. In Algorithmic Learning Theory: 26th International Conference, ALT 2015, Banff, AB, Canada, October 4-6, 2015, Proceedings, pages 287–301. Springer, 2015.
- Scale-free online learning. Theoretical Computer Science, 716:50–69, 2018.
- Summer workshop on learning-based algorithms. 2019. URL http://www.mit.edu/ vakilian/ttic-workshop.html.
- Improving online algorithms via ml predictions. In Advances in Neural Information Processing Systems, volume 31, 2018.
- Online learning with predictable sequences. In Conference on Learning Theory, pages 993–1019. PMLR, 2013a.
- Optimization, learning, and games with predictable sequences. Advances in Neural Information Processing Systems, 26, 2013b.
- Unifying the stochastic and the adversarial bandits with knapsack. arXiv preprint arXiv:1811.12253, 2018.
- Domingo Ribeiro Soriano. Quality in the consulting service–evaluation and impact: a survey in spanish firms. Managing Service Quality: An International Journal, 11(1):40–48, 2001.
- Combinatorial semi-bandits with knapsacks. In International Conference on Artificial Intelligence and Statistics, pages 1760–1770. PMLR, 2018.
- Time Series Analysis and Its Applications: With R Examples. Springer texts in statistics. Springer, 2017. URL https://github.com/nickpoison/tsa4/blob/master/textRcode.md.
- Adaptivity and optimism: An improved exponentiated gradient algorithm. In International Conference on Machine Learning, pages 1593–1601. PMLR, 2014.
- An analysis of bid-price controls for network revenue management. Management science, 44(11-part-1):1577–1593, 1998.
- Regulating greed over time in multi-armed bandits. J. Mach. Learn. Res., 22:3–1, 2021.
- Taking a hint: How to leverage loss predictors in contextual bandits? In Conference on Learning Theory, pages 3583–3634. PMLR, 2020.
- An approximate dynamic programming approach to network revenue management with customer choice. Transportation Science, 43(3):381–394, 2009.
- When demands evolve larger and noisier: Learning and earning in a growing environment. In International Conference on Machine Learning, pages 11629–11638. PMLR, 2020.