Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Policy Iteration for Pareto-Optimal Policies in Stochastic Stackelberg Games (2405.06689v1)

Published 7 May 2024 in cs.GT, cs.LG, cs.MA, and math.OC

Abstract: In general-sum stochastic games, a stationary Stackelberg equilibrium (SSE) does not always exist, in which the leader maximizes leader's return for all the initial states when the follower takes the best response against the leader's policy. Existing methods of determining the SSEs require strong assumptions to guarantee the convergence and the coincidence of the limit with the SSE. Moreover, our analysis suggests that the performance at the fixed points of these methods is not reasonable when they are not SSEs. Herein, we introduced the concept of Pareto-optimality as a reasonable alternative to SSEs. We derive the policy improvement theorem for stochastic games with the best-response follower and propose an iterative algorithm to determine the Pareto-optimal policies based on it. Monotone improvement and convergence of the proposed approach are proved, and its convergence to SSEs is proved in a special case.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Vulnerability of deep reinforcement learning to policy induction attacks. In Machine Learning and Data Mining in Pattern Recognition, pp.  262–275. Springer International Publishing, 2017.
  2. Stationary strong stackelberg equilibrium in discounted stochastic games. Technical Report RR-9271, INRIA, May 2019.
  3. Stationary strong stackelberg equilibrium in discounted stochastic games. IEEE Trans. Automat. Contr., 68(9):5271–5286, November 2022.
  4. Nash q-learning for general-sum stochastic games. J. Mach. Learn. Res., 4(null):1039–1069, dec 2003. ISSN 1532-4435.
  5. Deceptive reinforcement learning under adversarial manipulations on cost signals. In Decision and Game Theory for Security, pp.  217–237. Springer International Publishing, 2019.
  6. Decentralized cooperative reinforcement learning with hierarchical information structure. In Dasgupta, S. and Haghtalab, N. (eds.), Proceedings of The 33rd International Conference on Algorithmic Learning Theory, volume 167 of Proceedings of Machine Learning Research, pp.  573–605. PMLR, 2022.
  7. Kononen, V. Asymmetric multiagent reinforcement learning. In IEEE/WIC International Conference on Intelligent Agent Technology, 2003. IAT 2003, volume 2, pp.  105–121. IEEE Comput. Soc, 2004.
  8. Policy teaching via environment poisoning: Training-time adversarial attacks against reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  7974–7984. PMLR, 2020.
  9. Shapley, L. S. Stochastic games. Proc. Natl. Acad. Sci. U. S. A., 39(10):1095–1100, October 1953.
  10. Solan, E. A Course in Stochastic Game Theory. Cambridge University Press, May 2022.
  11. Vulnerability-Aware poisoning mechanism for online RL with unknown dynamics. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, September 2020.
  12. Reinforcement Learning, second edition: An Introduction. MIT Press, November 2018.
  13. Value-based policy teaching with active indirect elicitation. 2008. Accessed: 2023-2-6.
  14. Policy teaching through reward function learning. In Proceedings of the 10th ACM conference on Electronic commerce, EC ’09, pp.  295–304, New York, NY, USA, July 2009. Association for Computing Machinery.
  15. Bi-level actor-critic for multi-agent coordination. Proc. Conf. AAAI Artif. Intell., 34(05):7325–7332, April 2020a.
  16. Adaptive Reward-Poisoning attacks against reinforcement learning. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  11225–11234. PMLR, 2020b.
  17. Online learning in stackelberg games with an omniscient follower. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  42304–42316. PMLR, 2023.
  18. Can reinforcement learning find Stackelberg-Nash equilibria in General-Sum markov games with myopically rational followers? J. Mach. Learn. Res., 24(35):1–52, 2023.

Summary

We haven't generated a summary for this paper yet.