Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach (2405.02044v1)

Published 3 May 2024 in cs.LG, cs.AI, cs.GT, cs.SY, eess.SY, and math.OC

Abstract: Robust Reinforcement Learning (RRL) is a promising Reinforcement Learning (RL) paradigm aimed at training robust to uncertainty or disturbances models, making them more efficient for real-world applications. Following this paradigm, uncertainty or disturbances are interpreted as actions of a second adversarial agent, and thus, the problem is reduced to seeking the agents' policies robust to any opponent's actions. This paper is the first to propose considering the RRL problems within the positional differential game theory, which helps us to obtain theoretically justified intuition to develop a centralized Q-learning approach. Namely, we prove that under Isaacs's condition (sufficiently general for real-world dynamical systems), the same Q-function can be utilized as an approximate solution of both minimax and maximin BeLLMan equations. Based on these results, we present the Isaacs Deep Q-Network algorithms and demonstrate their superiority compared to other baseline RRL and Multi-Agent RL algorithms in various environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Wasserstein robust reinforcement learning. arXiv preprint arXiv:1907.13196, 2019.
  2. Model-free q-learning designs for linear discrete-time zero-sum games with application to h-infinity control. Automatica, 43(3):473–481, 2007.
  3. The differentiable cross-entropy method. Proceedings of the 37th International Conference on Machine Learning, 119:291–302, 2020.
  4. Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations. Birkhauser, Boston, 1997.
  5. H∞subscript𝐻H_{\infty}italic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT Optimal control and related minimax design problems. Birkhäuser, Basel, 1995.
  6. Bertsekas, D. P. Dynamic Programming and Stochastic Control. Academic Press, Inc., New York, San Francisco, London, 1976.
  7. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 38(2):156–172, 2008.
  8. Set-valued numerical analysis for optimal control and differential games. Annals of the International Society of Dynamic Games, 4:177–247, 1999.
  9. A deep reinforcement learning approach for finding non-exploitable strategies in two-player atari games. arXiv preprint arXiv:2207.08894, 2022.
  10. A theoretical analysis of deep q-learning. Proceedings of the 2nd Conference on Learning for Dynamics and Control, PMLR, 120:486–489, 2020.
  11. Fleming, W. H. The convergence problem for differential games. Journal of Mathematical Analysis and Applications, 8:102–116, 1961.
  12. Counterfactual multi-agent policy gradients. Proceedings of the AAAI Conference on Artificial Intelligence, pp.  2974–2982, 2018.
  13. Friedman, A. Differential Games. Intersic., New York, 1971.
  14. Adversarial policies: Attacking deep reinforcement learning. arXiv preprint arXiv:1905.10615, 2019.
  15. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proceedings of the 35 th International Conference on Machine Learning, PMLR, 80:1861–1870, 2018.
  16. h∞subscriptℎh_{\infty}italic_h start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT model-free reinforcement learning with robust stability guarantee. NeurIPS 2019 Workshop on Robot Learning: Control and Interaction in the Real World, 2019.
  17. Reinforcement learning applied to a differential game. Adaptive Behavior, 4(28):3–28, 1996.
  18. Isaacs, R. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization. John Wiley and Sons, Inc., New York, London, Sydney, 1965.
  19. Iyengar, G. Robust dynamic programming. Mathematics of Operations Research, 30(2):257–280, 2005.
  20. Approximate soft policy iteration based reinforcement learning for differential games with two pursuers versus one evader. Proceedings of the 5th International Conference on Advanced Robotics and Mechatronics (ICARM), pp.  471–476, 2020.
  21. Robust reinforcement learning via adversarial training with langevin dynamics. 34th Conference on Neural Information Processing Systems, 2020.
  22. Kamneva, L. Computation of solvability set for differential games in the plane with simple motion and non-convex terminal set. Dynamic Games and Applications, 9:724–750, 2019.
  23. Game-Theoretical Control Problems. Springer-Verlag, New York, Berlin, Heidelberg, London, Paris, Tokyo, 1987.
  24. On level sets with “narrow throats” in linear differential games. International Game Theory Review, 7(3):285–311, 2005.
  25. A unified game-theoretic approach to multiagent reinforcement learning. Proceedings of the 31st Conference on Neural Information Processing Systems, 2017.
  26. Infinite-horizon reach-avoid zero-sum games via deep reinforcement learning. arXiv preprint arXiv:2203.10142, 2022.
  27. Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. Proceedings of the AAAI Conference on Artificial Intelligence, 33(1):4213–4220, 2019.
  28. Littman, M. L. Markov games as a framework for multi-agent reinforcement learning. International Conference on Machine Learning, pp.  157–163, 1994.
  29. From motor control to team play in simulated humanoid football. Science Robotics, 7:69, 2022.
  30. Multi-agent actor-critic for mixed cooperative-competitive environments. Proceedings of the 31st International Conference on Neural Information Processing Systems, pp.  6382–6393, 2017.
  31. Differential games on minmax of the positional quality index. Dynamic Games and Applications, 9:780––799, 2019.
  32. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015. doi: 10.1038/nature14236.
  33. Asynchronous methods for deep reinforcement learning. Proceedings of The 33rd International Conference on Machine Learning, PMLR, 48:1928–1937, 2016.
  34. Robust reinforcement learning. Advances in Neural Information Processing Systems 13, 2000.
  35. Robust control of markov decision processes with uncertain transition matrices. Operations Research, 53(5):780–798, 2005.
  36. OpenAI. Openai five. https://blog.openai.com/openai-five/, 2018.
  37. Patsko, V. Special aspects of convex hull constructing in linear differential games of small dimension. IFAC Proceedings Volumes, 29(8):19–24, 1996.
  38. Phillips, P. Reinforcement learning in two player zero sum simultaneous action games. arXiv preprint arXiv:2110.04835, 2021.
  39. Robust adversarial reinforcement learning. Proceedings of the 34 th International Conference on Machine Learning, 70:2817–2826, 2017.
  40. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  41. Neurocomputing. Neurocomputing, 475:1–14, 2022.
  42. Shapley, L. S. Stochastic games. Proceedings of the National Academy of Sciences, 39:1095–1100, 1953.
  43. Mastering the game of go without human knowledge. Nature, 550:354–359, 2017.
  44. Subbotin, A. Generalized Solutions of First Order PDEs: the Dynamical Optimization Perspective. Birkhäuser, Berlin, 1995.
  45. Value-decomposition networks for cooperative multi-agent learning based on team reward. arXiv preprint arXiv:1706.05296, 2017.
  46. Reinforcement Learning An Introduction (Second Edition). The MIT Press, Cambridge, Massachusetts, 2018.
  47. Multiagent cooperation and competition with deep reinforcement learning. PloS One, 12(4), 2017.
  48. Action robust reinforcement learning and applications in continuous control. Proceedings of the 36th International Conference on Machine Learning, 97, 2019.
  49. Mujoco: A physics engine for model-based control. International Conference on Intelligent Robots and Systems, 2012.
  50. Van Der Wal, J. Stochastic dynamic programming : successive approximations and nearly optimal strategies for Markov decision processes and Markov games. Mathematisch Centrum, Amsterdam, 1980.
  51. Deep reinforcement learning with double q-learning. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), 30(1):2094–2100, 2016.
  52. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575:350–354, 2019.
  53. An application of continuous deep reinforcement learning approach to pursuit-evasion differential game. Proceedings of the IEEE 3rd Information Technology,Networking,Electronic and Automation Control Conference, pp.  1150–1155, 2019.
  54. Robust markov decision processes. Mathematics of Operations Research, 38(1), 2012.
  55. Pursuit and evasion strategy of a differential game based on deep reinforcement learning. Frontiers in Bioengineering and Biotechnology, 10:1–12, 2022.
  56. Robust adversarial reinforcement learning with dissipation inequation constraint. The Thirty-Sixth AAAI Conference on Artificial Intelligence, 36(5), 2022.
  57. Multi-agent reinforcement learning: A selective overview of theories and algorithms. arXiv preprint arXiv:1911.10635, 2021.
  58. Robust Optimal Control. Prentice Hall, New Jersey, 1996.
  59. Online minimax q network learning for two-player zero-sum markov games. IEEE Transactions on Neural Networks and Learning Systems, 33(3):1228–1241, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Anton Plaksin (5 papers)
  2. Vitaly Kalev (1 paper)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com