MNL-Bandit with Knapsacks: a near-optimal algorithm (2106.01135v5)
Abstract: We consider a dynamic assortment selection problem where a seller has a fixed inventory of $N$ substitutable products and faces an unknown demand that arrives sequentially over $T$ periods. In each period, the seller needs to decide on the assortment of products (satisfying certain constraints) to offer to the customers. The customer's response follows an unknown multinomial logit model (MNL) with parameter $\boldsymbol{v}$. If customer selects product $i \in [N]$, the seller receives revenue $r_i$. The goal of the seller is to maximize the total expected revenue from the $T$ customers given the fixed initial inventory of $N$ products. We present MNLwK-UCB, a UCB-based algorithm and characterize its regret under different regimes of inventory size. We show that when the inventory size grows quasi-linearly in time, MNLwK-UCB achieves a $\tilde{O}(N + \sqrt{NT})$ regret bound. We also show that for a smaller inventory (with growth $\sim T{\alpha}$, $\alpha < 1$), MNLwK-UCB achieves a $\tilde{O}(N(1 + T{\frac{1 - \alpha}{2}}) + \sqrt{NT})$. In particular, over a long time horizon $T$, the rate $\tilde{O}(\sqrt{NT})$ is always achieved regardless of the constraints and the size of the inventory.
- Thompson Sampling for the MNL-Bandit. In Proceedings of the 30th Conference on Learning Theory, COLT 2017, Amsterdam, The Netherlands, 7-10 July 2017 (Proceedings of Machine Learning Research, Vol. 65), Satyen Kale and Ohad Shamir (Eds.). PMLR, 76–78.
- MNL-Bandit: A Dynamic Learning Approach to Assortment Selection. Oper. Res. 67, 5 (2019), 1453–1485.
- Shipra Agrawal and Nikhil R. Devanur. 2019. Bandits with Global Convex Constraints and Objective. Operations Research 67, 5 (2019), 1486–1502.
- A Dynamic Near-Optimal Algorithm for Online Linear Programming. Oper. Res. 62, 4 (2014), 876–890.
- Active Learning in Multi-armed Bandits. In Algorithmic Learning Theory, Yoav Freund, László Györfi, György Turán, and Thomas Zeugmann (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 287–302.
- Dynamic Pricing with Limited Supply. ACM Trans. Econ. Comput. 3, 1, Article 4 (March 2015), 26 pages.
- Bandits with Knapsacks. J. ACM 65, 3 (2018), 13:1–13:55.
- Dynamic Assortment Customization with Limited Inventories. Manufacturing & Service Operations Management 17, 4 (2015), 538–553.
- Omar Besbes and Assaf J. Zeevi. 2009. Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms. Oper. Res. 57, 6 (2009), 1407–1420.
- Pure Exploration for Multi-Armed Bandit Problems. CoRR abs/0802.2655 (2008).
- Felipe Caro and Jérémie Gallien. 2007. Dynamic Assortment with Demand Learning for Seasonal Consumer Goods. Management Science 53, 2 (2007), 276–292.
- Combinatorial Multi-Armed Bandit: General Framework and Applications. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 28), Sanjoy Dasgupta and David McAllester (Eds.). PMLR, Atlanta, Georgia, USA, 151–159.
- Assortment Planning for Recommendations at Checkout under Inventory Constraints. Econometrics: Econometric & Statistical Methods - Special Topics eJournal (2016).
- Inventory Balancing with Online Learning. CoRR abs/1810.05640 (2018). arXiv:1810.05640
- Wang Chi Cheung and David Simchi-Levi. 2017. Assortment Optimization under Unknown MultiNomial Logit Choice Models. CoRR abs/1704.00108 (2017).
- Assortment Planning Under the Multinomial Logit Model with Totally Unimodular Constraint Structures.
- Assortment planning under the multinomial logit model with totally unimodular constraint structures. Work in Progress (2013).
- Column generation. Vol. 5. Springer Science & Business Media.
- Near-Optimal Algorithms for Capacity Constrained Assortment Optimization. Econometrics: Multiple Equation Models eJournal (2014).
- Nikhil R. Devanur and Thomas P. Hayes. 2009. The Adwords Problem: Online Keyword Matching with Budgeted Bidders under Random Permutations. In Proceedings of the 10th ACM Conference on Electronic Commerce (Stanford, California, USA) (EC ’09). Association for Computing Machinery, New York, NY, USA, 71–78.
- Near Optimal Online Algorithms and Fast Approximation Algorithms for Resource Allocation Problems. CoRR abs/1903.03944 (2019). arXiv:1903.03944
- Revenue management with product retirement and customer selection. Available at SSRN 4033922 (2022).
- Exponential inequalities for martingales with applications. (2015).
- Multi-stage and Multi-customer Assortment Optimization with Inventory Constraints. CoRR abs/1908.09808 (2019). arXiv:1908.09808
- Managing Flexible Products on a Network.
- Online Resource Allocation with Customer Choice. arXiv:1511.01837 [math.OC]
- Real-Time Optimization of Personalized Assortments. Management Science 60, 6 (2014), 1532–1551.
- Adversarial Bandits with Knapsacks. In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, David Zuckerman (Ed.). IEEE Computer Society, 202–219.
- Multi-armed bandits in metric spaces. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing, Victoria, British Columbia, Canada, May 17-20, 2008, Cynthia Dwork (Ed.). ACM, 681–690.
- Sumit Kunnumkal and Huseyin Topaloglu. 2010. A New Dynamic Programming Decomposition Method for the Network Revenue Management Problem with Customer Choice Behavior. Production and Operations Management 19, 5 (2010), 575–590.
- Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2015, San Diego, California, USA, May 9-12, 2015 (JMLR Workshop and Conference Proceedings, Vol. 38), Guy Lebanon and S. V. N. Vishwanathan (Eds.). JMLR.org.
- Qian Liu and Garrett Van Ryzin. 2008. On the choice-based linear programming model for network revenue management. Manufacturing & Service Operations Management 10, 2 (2008), 288–310.
- R. Duncan Luce. 1959. Individual Choice Behavior: A Theoretical analysis. Wiley, New York, NY, USA.
- D. Mcfadden. 1977. Modelling the Choice of Residential Location. Transportation Research Record (1977).
- AdWords and Generalized Online Matching. J. ACM 54, 5 (Oct. 2007), 22–es.
- Joern Meissner and Arne Strauss. 2012. Network revenue management with inventory-sensitive bid prices and customer choice. Eur. J. Oper. Res. 216, 2 (2012), 459–468.
- A general framework for resource constrained revenue management with demand learning and large action space. NYU Stern School of Business Forthcoming (2021).
- R. L. Plackett. 1975. The Analysis of Permutations. Journal of the Royal Statistical Society. Series C (Applied Statistics) 24, 2 (1975), 193–202.
- Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation. 461–469.
- Dynamic Assortment Optimization with a Multinomial Logit Choice Model and Capacity Constraint. Operations Research 58, 6 (2010), 1666–1680.
- Karthik Abinav Sankararaman and Aleksandrs Slivkins. 2018. Combinatorial Semi-Bandits with Knapsacks. In International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain (Proceedings of Machine Learning Research, Vol. 84), Amos J. Storkey and Fernando Pérez-Cruz (Eds.). PMLR, 1760–1770.
- Karthik Abinav Sankararaman and Aleksandrs Slivkins. 2020. Advances in Bandits with Knapsacks. CoRR abs/2002.00253 (2020). arXiv:2002.00253 https://arxiv.org/abs/2002.00253
- Denis Sauré and Assaf Zeevi. 2013. Optimal Dynamic Assortment Planning with Demand Learning. Manufacturing & Service Operations Management 15, 3 (2013), 387–404.
- Kalyan Talluri and Garrett van Ryzin. 2004. The Theory and Practice of Revenue Management. (01 2004).
- Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada, Jörg Hoffmann and Bart Selman (Eds.). AAAI Press.
- Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc., 3101–3110.
- Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 3105–3114.
- Budgeted Bandit Problems with Continuous Random Costs. In Asian Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 45), Geoffrey Holmes and Tie-Yan Liu (Eds.). PMLR, Hong Kong, 317–332.
- Thompson Sampling for Budgeted Multi-Armed Bandits. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, Qiang Yang and Michael J. Wooldridge (Eds.). AAAI Press, 3960–3966.