Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Offline Fictitious Self-Play for Competitive Games (2403.00841v1)

Published 29 Feb 2024 in cs.MA, cs.AI, cs.GT, and cs.LG

Abstract: Offline Reinforcement Learning (RL) has received significant interest due to its ability to improve policies in previously collected datasets without online interactions. Despite its success in the single-agent setting, offline multi-agent RL remains a challenge, especially in competitive games. Firstly, unaware of the game structure, it is impossible to interact with the opponents and conduct a major learning paradigm, self-play, for competitive games. Secondly, real-world datasets cannot cover all the state and action space in the game, resulting in barriers to identifying Nash equilibrium (NE). To address these issues, this paper introduces Off-FSP, the first practical model-free offline RL algorithm for competitive games. We start by simulating interactions with various opponents by adjusting the weights of the fixed dataset with importance sampling. This technique allows us to learn best responses to different opponents and employ the Offline Self-Play learning framework. In this framework, we further implement Fictitious Self-Play (FSP) to approximate NE. In partially covered real-world datasets, our methods show the potential to approach NE by incorporating any single-agent offline RL method. Experimental results in Leduc Hold'em Poker show that our method significantly improves performances compared with state-of-the-art baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Heads-up limit hold’em poker is solved. Science, 347(6218):145–149, 2015.
  2. Brown, G. W. Iterative solution of games by fictitious play. Act. Anal. Prod Allocation, 13(1):374, 1951.
  3. Superhuman ai for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
  4. Superhuman ai for multiplayer poker. Science, 365(6456):885–890, 2019.
  5. When are offline two-player zero-sum markov games solvable? Advances in Neural Information Processing Systems, 35:25779–25791, 2022a.
  6. Provably efficient offline multi-agent reinforcement learning via strategy-wise bonus. Advances in Neural Information Processing Systems, 35:11739–11751, 2022b.
  7. Deploying paws: Field optimization of the protection assistant for wildlife security. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, pp.  3966–3973, 2016.
  8. Benchmarking batch deep reinforcement learning algorithms, 2019a.
  9. Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pp.  2052–2062. PMLR, 2019b.
  10. Crowdplay: Crowdsourcing human demonstrations for offline learning. In International Conference on Learning Representations, 2021.
  11. Solving for best responses in extensive-form games using reinforcement learning methods. RLDM 2013, pp.  116, 2013.
  12. Fictitious self-play in extensive-form games. In International conference on machine learning, pp.  805–813. PMLR, 2015.
  13. Beyond uniform sampling: Offline reinforcement learning with imbalanced datasets. arXiv preprint arXiv:2310.04413, 2023.
  14. Adam: A method for stochastic optimization, 2017.
  15. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  16. Google research football: A novel reinforcement learning environment. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.  4501–4510, 2020.
  17. A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30, 2017.
  18. Generalised weakened fictitious play. Games and Economic Behavior, 56(2):285–298, 2006.
  19. Offline equilibrium finding. arXiv preprint arXiv:2207.05285, 2022.
  20. Alphastar unplugged: Large-scale offline reinforcement learning. arXiv preprint arXiv:2308.03526, 2023.
  21. Safe and efficient off-policy reinforcement learning. Advances in neural information processing systems, 29, 2016.
  22. Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections. Advances in neural information processing systems, 32, 2019.
  23. Mastering the game of stratego with model-free multiagent reinforcement learning. Science, 378(6623):990–996, 2022.
  24. Hokoff: Real game dataset from honor of kings and its offline reinforcement learning benchmarks. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
  25. Monte-carlo planning in large pomdps. Advances in neural information processing systems, 23, 2010.
  26. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.
  27. Approximate exploitability: learning a best response. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp.  3487–3493, 2022.
  28. Offline multi-agent reinforcement learning with knowledge distillation. Advances in Neural Information Processing Systems, 35:226–237, 2022.
  29. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
  30. Von Stengel, B. Efficient computation of behavior strategies. Games and Economic Behavior, 14(2):220–246, 1996.
  31. Critic regularized regression. Advances in Neural Information Processing Systems, 33:7768–7778, 2020.
  32. Honor of kings arena: an environment for generalization in competitive reinforcement learning. Advances in Neural Information Processing Systems, 35:11881–11892, 2022.
  33. Model-based reinforcement learning is minimax-optimal for offline zero-sum markov games. arXiv preprint arXiv:2206.04044, 2022.
  34. Perfectdou: Dominating doudizhu with perfect information distillation. Advances in Neural Information Processing Systems, 35:34954–34965, 2022.
  35. Improving dialog systems for negotiation with personality modeling. arXiv preprint arXiv:2010.09954, 2020.
  36. Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34:10299–10312, 2021.
  37. A multi-agent framework for packet routing in wireless sensor networks. sensors, 15(5):10026–10047, 2015.
  38. Towards playing full moba games with deep reinforcement learning. Advances in Neural Information Processing Systems, 33:621–632, 2020.
  39. Asynchronous multi-agent reinforcement learning for efficient real-time multi-robot cooperative exploration. arXiv preprint arXiv:2301.03398, 2023.
  40. Cooperative multiagent deep reinforcement learning for reliable surveillance via autonomous multi-uav control. IEEE Transactions on Industrial Informatics, 18(10):7086–7096, 2022.
  41. Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets. In International Conference on Machine Learning, pp.  27117–27142. PMLR, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Jingxiao Chen (9 papers)
  2. Weiji Xie (4 papers)
  3. Weinan Zhang (322 papers)
  4. Ying Wen (75 papers)
  5. Yong Yu (219 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com