Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

No-Regret Learning for Stackelberg Equilibrium Computation in Newsvendor Pricing Games (2404.00203v3)

Published 30 Mar 2024 in cs.CE and cs.MA

Abstract: We introduce the application of online learning in a Stackelberg game pertaining to a system with two learning agents in a dyadic exchange network, consisting of a supplier and retailer, specifically where the parameters of the demand function are unknown. In this game, the supplier is the first-moving leader, and must determine the optimal wholesale price of the product. Subsequently, the retailer who is the follower, must determine both the optimal procurement amount and selling price of the product. In the perfect information setting, this is known as the classical price-setting Newsvendor problem, and we prove the existence of a unique Stackelberg equilibrium when extending this to a two-player pricing game. In the framework of online learning, the parameters of the reward function for both the follower and leader must be learned, under the assumption that the follower will best respond with optimism under uncertainty. A novel algorithm based on contextual linear bandits with a measurable uncertainty set is used to provide a confidence bound on the parameters of the stochastic demand. Consequently, optimal finite time regret bounds on the Stackelberg regret, along with convergence guarantees to an approximate Stackelberg equilibrium, are provided.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Simon P Anderson and Maxim Engers “Stackelberg versus Cournot oligopoly equilibrium” In International Journal of Industrial Organization 10.1 Elsevier, 1992, pp. 127–135
  2. Kenneth J Arrow, Theodore Harris and Jacob Marschak “Optimal inventory policy” In Econometrica: Journal of the Econometric Society JSTOR, 1951, pp. 250–272
  3. Yasin Abbasi-Yadkori, Dávid Pál and Csaba Szepesvári “Improved algorithms for linear stochastic bandits” In Advances in neural information processing systems 24, 2011
  4. “Dynamic pricing with limited supply” ACM New York, NY, USA, 2015
  5. “Sample-efficient learning of stackelberg equilibria in general-sum games” In Advances in Neural Information Processing Systems 34, 2021, pp. 25799–25811
  6. “Commitment without regrets: Online learning in stackelberg security games” In Proceedings of the sixteenth ACM conference on economics and computation, 2015, pp. 61–78
  7. “Optimally deceiving a learning leader in stackelberg games” In Advances in Neural Information Processing Systems 33, 2020, pp. 20624–20635
  8. “Dynamic pricing: A learning approach” In Mathematical and computational models for congestion charging Springer, 2006, pp. 45–79
  9. “The big data newsvendor: Practical insights from machine learning” In Operations Research 67.1 INFORMS, 2019, pp. 90–108
  10. “Selling to a no-regret buyer” In Proceedings of the 2018 ACM Conference on Economics and Computation, 2018, pp. 523–538
  11. Jinzhi Bu, David Simchi-Levi and Chonghuan Wang “Context-Based Dynamic Pricing with Partially Linear Demand Model” In Advances in Neural Information Processing Systems 35, 2022, pp. 23780–23791
  12. “Online Learning in Supply-Chain Games” In arXiv preprint arXiv:2207.04054, 2022
  13. “Contextual bandits with linear payoff functions” In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 208–214 JMLR WorkshopConference Proceedings
  14. Yiling Chen, Yang Liu and Chara Podimata “Grinding the space: Learning to classify against strategic agents” In arXiv preprint arXiv:1911.04004, 2019
  15. Gérard P Cachon and Paul H Zipkin “Competitive and cooperative inventory policies in a two-stage supply chain” In Management science 45.7 INFORMS, 1999, pp. 936–953
  16. “Dynamic pricing with limited competitor information in a multi-agent economy” In Cooperative Information Systems: 7th International Conference, CoopIS 2000 Eilat, Israel, September 6-8, 2000. Proceedings 7, 2000, pp. 299–310 Springer
  17. Giovanni De Fraja and Flavio Delbono “Game theoretic models of mixed oligopoly” In Journal of economic surveys 4.1 Wiley Online Library, 1990, pp. 1–17
  18. Constantinos Daskalakis, Paul W Goldberg and Christos H Papadimitriou “The complexity of computing a Nash equilibrium” In Communications of the ACM 52.2 ACM New York, NY, USA, 2009, pp. 89–97
  19. Varsha Dani, Thomas P Hayes and Sham M Kakade “Stochastic linear optimization under bandit feedback”, 2008
  20. Tanner Fiez, Benjamin Chasnov and Lillian J Ratliff “Convergence of learning dynamics in stackelberg games” In arXiv preprint arXiv:1906.01217, 2019
  21. Tanner Fiez, Benjamin Chasnov and Lillian Ratliff “Implicit learning dynamics in stackelberg games: Equilibria characterization, convergence analysis, and empirical study” In International Conference on Machine Learning, 2020, pp. 3133–3144 PMLR
  22. Abraham D. Flaxman, Adam Tauman Kalai and H.Brendan McMahan “Online convex optimization in the bandit setting: gradient descent without a gradient” arXiv:cs/0408007 arXiv, 2004 URL: http://arxiv.org/abs/cs/0408007
  23. “Robust Stackelberg Equilibria” In Proceedings of the 24th ACM Conference on Economics and Computation, EC ’23 London, United Kingdom: Association for Computing Machinery, 2023, pp. 735 DOI: 10.1145/3580507.3597680
  24. Aurélien Garivier, Tor Lattimore and Emilie Kaufmann “On explore-then-commit strategies” In Advances in Neural Information Processing Systems 29, 2016
  25. “Dynamic pricing and assortment under a contextual MNL demand” In arXiv preprint arXiv:2110.10018, 2021
  26. “Learning in Stackelberg Games with Non-myopic Agents” In Proceedings of the 23rd ACM Conference on Economics and Computation, 2022, pp. 917–918
  27. George Hadley and Thomson M Whitin “Analysis of inventory systems”, 1963
  28. “The price-setting newsvendor with service and loss constraints” In Omega 41.2 Elsevier, 2013, pp. 326–335
  29. “The value of knowing a demand curve: Bounds on regret for online posted-price auctions” In 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings., 2003, pp. 594–605 IEEE
  30. Erich Kutschinski, Thomas Uthmann and Daniel Polani “Learning competitive pricing strategies by multi-agent reinforcement learning” In Journal of Economic Dynamics and Control 27.11-12 Elsevier, 2003, pp. 2207–2218
  31. “Bandit algorithms” Cambridge University Press, 2020
  32. “The price of anarchy in closed-loop supply chains” In International Transactions in Operational Research 29.1 Wiley Online Library, 2022, pp. 624–656
  33. Edwin S Mills “Uncertainty and price theory” In The Quarterly Journal of Economics 73.1 MIT Press, 1959, pp. 116–130
  34. Jiseong Noh, Jong Soo Kim and Biswajit Sarkar “Two-echelon supply chain coordination with advertising-driven demand under Stackelberg game policy” In European journal of industrial engineering 13.2 Inderscience Publishers (IEL), 2019, pp. 213–244
  35. Nicholas C Petruzzi and Maqbool Dada “Pricing and the newsvendor problem: A review with extensions” In Operations research 47.2 INFORMS, 1999, pp. 183–194
  36. Georgia Perakis “The “price of anarchy” under nonlinear and asymmetric costs” In Mathematics of Operations Research 32.3 INFORMS, 2007, pp. 614–628
  37. Praveen K Kopalle PK Kannan “Dynamic pricing on the Internet: Importance and implications for consumer behavior” In International Journal of Electronic Commerce 5.3 Taylor & Francis, 2001, pp. 63–83
  38. Victor H Pena, Michael J Klass and Tze Leung Lai “Self-normalized processes: exponential inequalities, moment bounds and iterated logarithm laws”, 2004
  39. Michael E Porter and Competitive Strategy “Techniques for analyzing industries and competitors” In Competitive Strategy. New York: Free, 1980
  40. Tim Roughgarden “Algorithmic game theory” In Communications of the ACM 53.7 ACM New York, NY, USA, 2010, pp. 78–86
  41. Paat Rusmevichientong and John N Tsitsiklis “Linearly parameterized bandits” In Mathematics of Operations Research 35.2 INFORMS, 2010, pp. 395–411
  42. Csaba Szepesvári “Algorithms for reinforcement learning” In Synthesis lectures on artificial intelligence and machine learning 4.1 Morgan & Claypool Publishers, 2010, pp. 1–103
  43. Jean Tirole “The theory of industrial organization” MIT press, 1988
  44. Karl Weierstrass “Über die analytische Darstellbarkeit sogenannter willkürlicher Functionen einer reellen Veränderlichen” In Verl. d. Kgl. Akad. d. Wiss 2, 1885
  45. “Coordination of information sharing in a supply chain” In International Journal of Production Economics 143.1 Elsevier, 2013, pp. 178–187
  46. “Online Learning in Stackelberg Games with an Omniscient Follower” arXiv:2301.11518 [cs] arXiv, 2023 URL: http://arxiv.org/abs/2301.11518

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com