No-Regret Learning for Stackelberg Equilibrium Computation in Newsvendor Pricing Games (2404.00203v3)
Abstract: We introduce the application of online learning in a Stackelberg game pertaining to a system with two learning agents in a dyadic exchange network, consisting of a supplier and retailer, specifically where the parameters of the demand function are unknown. In this game, the supplier is the first-moving leader, and must determine the optimal wholesale price of the product. Subsequently, the retailer who is the follower, must determine both the optimal procurement amount and selling price of the product. In the perfect information setting, this is known as the classical price-setting Newsvendor problem, and we prove the existence of a unique Stackelberg equilibrium when extending this to a two-player pricing game. In the framework of online learning, the parameters of the reward function for both the follower and leader must be learned, under the assumption that the follower will best respond with optimism under uncertainty. A novel algorithm based on contextual linear bandits with a measurable uncertainty set is used to provide a confidence bound on the parameters of the stochastic demand. Consequently, optimal finite time regret bounds on the Stackelberg regret, along with convergence guarantees to an approximate Stackelberg equilibrium, are provided.
- Simon P Anderson and Maxim Engers “Stackelberg versus Cournot oligopoly equilibrium” In International Journal of Industrial Organization 10.1 Elsevier, 1992, pp. 127–135
- Kenneth J Arrow, Theodore Harris and Jacob Marschak “Optimal inventory policy” In Econometrica: Journal of the Econometric Society JSTOR, 1951, pp. 250–272
- Yasin Abbasi-Yadkori, Dávid Pál and Csaba Szepesvári “Improved algorithms for linear stochastic bandits” In Advances in neural information processing systems 24, 2011
- “Dynamic pricing with limited supply” ACM New York, NY, USA, 2015
- “Sample-efficient learning of stackelberg equilibria in general-sum games” In Advances in Neural Information Processing Systems 34, 2021, pp. 25799–25811
- “Commitment without regrets: Online learning in stackelberg security games” In Proceedings of the sixteenth ACM conference on economics and computation, 2015, pp. 61–78
- “Optimally deceiving a learning leader in stackelberg games” In Advances in Neural Information Processing Systems 33, 2020, pp. 20624–20635
- “Dynamic pricing: A learning approach” In Mathematical and computational models for congestion charging Springer, 2006, pp. 45–79
- “The big data newsvendor: Practical insights from machine learning” In Operations Research 67.1 INFORMS, 2019, pp. 90–108
- “Selling to a no-regret buyer” In Proceedings of the 2018 ACM Conference on Economics and Computation, 2018, pp. 523–538
- Jinzhi Bu, David Simchi-Levi and Chonghuan Wang “Context-Based Dynamic Pricing with Partially Linear Demand Model” In Advances in Neural Information Processing Systems 35, 2022, pp. 23780–23791
- “Online Learning in Supply-Chain Games” In arXiv preprint arXiv:2207.04054, 2022
- “Contextual bandits with linear payoff functions” In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 208–214 JMLR WorkshopConference Proceedings
- Yiling Chen, Yang Liu and Chara Podimata “Grinding the space: Learning to classify against strategic agents” In arXiv preprint arXiv:1911.04004, 2019
- Gérard P Cachon and Paul H Zipkin “Competitive and cooperative inventory policies in a two-stage supply chain” In Management science 45.7 INFORMS, 1999, pp. 936–953
- “Dynamic pricing with limited competitor information in a multi-agent economy” In Cooperative Information Systems: 7th International Conference, CoopIS 2000 Eilat, Israel, September 6-8, 2000. Proceedings 7, 2000, pp. 299–310 Springer
- Giovanni De Fraja and Flavio Delbono “Game theoretic models of mixed oligopoly” In Journal of economic surveys 4.1 Wiley Online Library, 1990, pp. 1–17
- Constantinos Daskalakis, Paul W Goldberg and Christos H Papadimitriou “The complexity of computing a Nash equilibrium” In Communications of the ACM 52.2 ACM New York, NY, USA, 2009, pp. 89–97
- Varsha Dani, Thomas P Hayes and Sham M Kakade “Stochastic linear optimization under bandit feedback”, 2008
- Tanner Fiez, Benjamin Chasnov and Lillian J Ratliff “Convergence of learning dynamics in stackelberg games” In arXiv preprint arXiv:1906.01217, 2019
- Tanner Fiez, Benjamin Chasnov and Lillian Ratliff “Implicit learning dynamics in stackelberg games: Equilibria characterization, convergence analysis, and empirical study” In International Conference on Machine Learning, 2020, pp. 3133–3144 PMLR
- Abraham D. Flaxman, Adam Tauman Kalai and H.Brendan McMahan “Online convex optimization in the bandit setting: gradient descent without a gradient” arXiv:cs/0408007 arXiv, 2004 URL: http://arxiv.org/abs/cs/0408007
- “Robust Stackelberg Equilibria” In Proceedings of the 24th ACM Conference on Economics and Computation, EC ’23 London, United Kingdom: Association for Computing Machinery, 2023, pp. 735 DOI: 10.1145/3580507.3597680
- Aurélien Garivier, Tor Lattimore and Emilie Kaufmann “On explore-then-commit strategies” In Advances in Neural Information Processing Systems 29, 2016
- “Dynamic pricing and assortment under a contextual MNL demand” In arXiv preprint arXiv:2110.10018, 2021
- “Learning in Stackelberg Games with Non-myopic Agents” In Proceedings of the 23rd ACM Conference on Economics and Computation, 2022, pp. 917–918
- George Hadley and Thomson M Whitin “Analysis of inventory systems”, 1963
- “The price-setting newsvendor with service and loss constraints” In Omega 41.2 Elsevier, 2013, pp. 326–335
- “The value of knowing a demand curve: Bounds on regret for online posted-price auctions” In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., 2003, pp. 594–605 IEEE
- Erich Kutschinski, Thomas Uthmann and Daniel Polani “Learning competitive pricing strategies by multi-agent reinforcement learning” In Journal of Economic Dynamics and Control 27.11-12 Elsevier, 2003, pp. 2207–2218
- “Bandit algorithms” Cambridge University Press, 2020
- “The price of anarchy in closed-loop supply chains” In International Transactions in Operational Research 29.1 Wiley Online Library, 2022, pp. 624–656
- Edwin S Mills “Uncertainty and price theory” In The Quarterly Journal of Economics 73.1 MIT Press, 1959, pp. 116–130
- Jiseong Noh, Jong Soo Kim and Biswajit Sarkar “Two-echelon supply chain coordination with advertising-driven demand under Stackelberg game policy” In European journal of industrial engineering 13.2 Inderscience Publishers (IEL), 2019, pp. 213–244
- Nicholas C Petruzzi and Maqbool Dada “Pricing and the newsvendor problem: A review with extensions” In Operations research 47.2 INFORMS, 1999, pp. 183–194
- Georgia Perakis “The “price of anarchy” under nonlinear and asymmetric costs” In Mathematics of Operations Research 32.3 INFORMS, 2007, pp. 614–628
- Praveen K Kopalle PK Kannan “Dynamic pricing on the Internet: Importance and implications for consumer behavior” In International Journal of Electronic Commerce 5.3 Taylor & Francis, 2001, pp. 63–83
- Victor H Pena, Michael J Klass and Tze Leung Lai “Self-normalized processes: exponential inequalities, moment bounds and iterated logarithm laws”, 2004
- Michael E Porter and Competitive Strategy “Techniques for analyzing industries and competitors” In Competitive Strategy. New York: Free, 1980
- Tim Roughgarden “Algorithmic game theory” In Communications of the ACM 53.7 ACM New York, NY, USA, 2010, pp. 78–86
- Paat Rusmevichientong and John N Tsitsiklis “Linearly parameterized bandits” In Mathematics of Operations Research 35.2 INFORMS, 2010, pp. 395–411
- Csaba Szepesvári “Algorithms for reinforcement learning” In Synthesis lectures on artificial intelligence and machine learning 4.1 Morgan & Claypool Publishers, 2010, pp. 1–103
- Jean Tirole “The theory of industrial organization” MIT press, 1988
- Karl Weierstrass “Über die analytische Darstellbarkeit sogenannter willkürlicher Functionen einer reellen Veränderlichen” In Verl. d. Kgl. Akad. d. Wiss 2, 1885
- “Coordination of information sharing in a supply chain” In International Journal of Production Economics 143.1 Elsevier, 2013, pp. 178–187
- “Online Learning in Stackelberg Games with an Omniscient Follower” arXiv:2301.11518 [cs] arXiv, 2023 URL: http://arxiv.org/abs/2301.11518