Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stochastic Principal-Agent Problems: Efficient Computation and Learning (2306.03832v3)

Published 6 Jun 2023 in cs.GT, cs.LG, and cs.MA

Abstract: We introduce a stochastic principal-agent model. A principal and an agent interact in a stochastic environment, each privy to observations about the state not available to the other. The principal has the power of commitment, both to elicit information from the agent and to provide signals about her own information. The players communicate with each other and then select actions independently. Each of them receives a payoff based on the state and their joint action, and the environment transitions to a new state. The interaction continues over a finite time horizon. Both players are far-sighted, aiming to maximize their total payoffs over the time horizon. The model encompasses as special cases extensive-form games (EFGs) and stochastic games of incomplete information, partially observable Markov decision processes (POMDPs), as well as other forms of sequential principal-agent interactions, including Bayesian persuasion and automated mechanism design problems. We consider both the computation and learning of the principal's optimal policy. Since the general problem, which subsumes POMDPs, is intractable, we explore algorithmic solutions under hindsight observability, where the state and the interaction history are revealed at the end of each step. Though the problem becomes more amenable under this condition, the number of possible histories remains exponential in the length of the time horizon, making approaches for EFG-based models infeasible. We present an efficient algorithm based on the inducible value sets. The algorithm computes an $\epsilon$-approximate optimal policy in time polynomial in $1/\epsilon$. Additionally, we show an efficient learning algorithm for an episodic reinforcement learning setting where the transition probabilities are unknown. The algorithm guarantees sublinear regret $\tilde{O}(T{2/3})$ for both players over $T$ episodes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Stephen A Ross. The economic theory of agency: The principal’s problem. The American economic review, 63(2):134–139, 1973.
  2. Theory of the firm: Managerial behavior, agency costs and ownership structure. Journal of Financial Economics, 3(4):305–360, 1976.
  3. Contract Theory. Mit Press, 2004.
  4. Recursive Macroeconomic Theory, fourth edition. Mit Press, 2018.
  5. Repeated Games with Incomplete Information. MIT Press, 1995.
  6. Repeated Games (Econometric Society Monographs Book 55). Cambridge University Press, 2015.
  7. Heinrich Von Stackelberg. Market structure and equilibrium. Springer Science & Business Media, 2010.
  8. Bayesian persuasion. American Economic Review, 101(6):2590–2615, 2011.
  9. Sequential decision making with information asymmetry. In 33rd International Conference on Concurrency Theory (CONCUR’22), volume 243 of Leibniz International Proceedings in Informatics (LIPIcs), pages 4:1–4:18, 2022a.
  10. Sequential information design: Markov persuasion process and its efficient reinforcement learning. In Proceedings of the 23rd ACM Conference on Economics and Computation (EC’22), page 471–472, 2022.
  11. Automated dynamic mechanism design, 2021.
  12. Lloyd S Shapley. Stochastic games. Proceedings of the national academy of sciences, 39(10):1095–1100, 1953.
  13. The complexity of markov decision processes. Mathematics of operations research, 12(3):441–450, 1987.
  14. Bayesian persuasion in sequential decision-making. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI’22), pages 5025–5033, 2022b.
  15. Sequential information design: Learning to persuade in the dark. arXiv preprint arXiv:2209.03927, 2022.
  16. Roger B Myerson. Optimal coordination mechanisms in generalized principal–agent problems. Journal of mathematical economics, 10(1):67–81, 1982.
  17. Relying on the information of interested parties. Rand Journal of Economics, 17:18–32, 1986.
  18. Miltiadis Makris. The theory of incentives: The principal-agent model., 2003.
  19. Roger B Myerson. Multistage games with communication. Econometrica: Journal of the Econometric Society, pages 323–358, 1986.
  20. Francoise Forges. An approach to communication equilibria. Econometrica: Journal of the Econometric Society, pages 1375–1385, 1986.
  21. Tuomas Sandholm. Automated mechanism design: A new application area for search algorithms. In Principles and Practice of Constraint Programming–CP 2003: 9th International Conference, CP 2003, Kinsale, Ireland, September 29–October 3, 2003. Proceedings 9, pages 19–36. Springer, 2003.
  22. Bayesian persuasion meets mechanism design: Going beyond intractability with type reporting. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS’22), page 226–234, 2022.
  23. Optimal coordination in generalized principal-agent problems: A revisit and extensions. arXiv preprint arXiv:2209.01146, 2022c.
  24. Private bayesian persuasion with sequential games. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’20), volume 34, pages 1886–1893, 2020.
  25. Online mechanism design for information acquisition. arXiv preprint arXiv:2302.02873, 2023.
  26. Computing optimal strategies to commit to in extensive-form games. In Proceedings of the 11th ACM conference on Electronic commerce (EC’10), pages 83–92, 2010.
  27. Computing optimal strategies to commit to in stochastic games. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI’12), page 1380–1386, 2012.
  28. Computation of stackelberg equilibria of finite sequential games. ACM Transactions on Economics and Computation (TEAC), 5(4):1–24, 2017.
  29. Stateful strategic regression. Advances in Neural Information Processing Systems, 34:28728–28741, 2021.
  30. Efficient stackelberg strategies for finitely repeated games. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems (AAMAS’23), page 643–651, 2023.
  31. Quick polytope approximation of all correlated equilibria in stochastic games. In Wolfram Burgard and Dan Roth, editors, Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI’11), 2011.
  32. Timothy M Chan. Optimal output-sensitive convex hull algorithms in two and three dimensions. Discrete & Computational Geometry, 16(4):361–368, 1996.
  33. Shadows and slices of polytopes. In Sue Whitesides, editor, Proceedings of the Twelfth Annual Symposium on Computational Geometry, Philadelphia, PA, USA, May 24-26, 1996, pages 10–19. ACM, 1996. doi: 10.1145/237218.237228. URL https://doi.org/10.1145/237218.237228.
  34. Reward-free exploration for reinforcement learning. In International Conference on Machine Learning, pages 4870–4879. PMLR, 2020.
  35. Adaptive reward-free exploration. In Algorithmic Learning Theory, pages 865–891. PMLR, 2021.
  36. Fast active learning for pure exploration in reinforcement learning. In International Conference on Machine Learning, pages 7599–7608. PMLR, 2021.
  37. Computing correlated equilibria in multi-player games. Journal of the ACM (JACM), 55(3):1–29, 2008.
  38. Polynomial-time computation of exact correlated equilibrium in compact games. In Proceedings of the 12th ACM conference on Electronic commerce, pages 119–126, 2011.
  39. Computing equilibria in multi-player games. In SODA, volume 5, pages 82–91. Citeseer, 2005.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jiarui Gan (26 papers)
  2. Rupak Majumdar (99 papers)
  3. Debmalya Mandal (32 papers)
  4. Goran Radanovic (33 papers)
Citations (1)