Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Conflux-PSRO: Effectively Leveraging Collective Advantages in Policy Space Response Oracles (2410.22776v2)

Published 30 Oct 2024 in cs.GT

Abstract: Policy Space Response Oracle (PSRO) with policy population construction has been demonstrated as an effective method for approximating Nash Equilibrium (NE) in zero-sum games. Existing studies have attempted to improve diversity in policy space, primarily by incorporating diversity regularization into the Best Response (BR). However, these methods cause the BR to deviate from maximizing rewards, easily resulting in a population that favors diversity over performance, even when diversity is not always necessary. Consequently, exploitability is difficult to reduce until policies are fully explored, especially in complex games. In this paper, we propose Conflux-PSRO, which fully exploits the diversity of the population by adaptively selecting and training policies at state-level. Specifically, Conflux-PSRO identifies useful policies from the existing population and employs a routing policy to select the most appropriate policies at each decision point, while simultaneously training them to enhance their effectiveness. Compared to the single-policy BR of traditional PSRO and its diversity-improved variants, the BR generated by Conflux-PSRO not only leverages the specialized expertise of diverse policies but also synergistically enhances overall performance. Our experiments on various environments demonstrate that Conflux-PSRO significantly improves the utility of BRs and reduces exploitability compared to existing methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Pc-pg: Policy cover directed exploration for provable policy gradient learning. Advances in neural information processing systems 33 (2020), 13399–13412.
  2. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning. PMLR, 176–185.
  3. Diverse Population-Based Reinforcement Learning. arXiv preprint arXiv:1901.08106 (2019).
  4. Sample-efficient reinforcement learning with stochastic ensemble value expansion. Advances in neural information processing systems 31 (2018).
  5. Real World Games Look Like Spinning Tops. ArXiv abs/2004.09468 (2020). https://api.semanticscholar.org/CorpusID:215827540
  6. Survey on Mixture of Experts for RL. Survey Collection (2021).
  7. Christopher P Ferguson and Thomas S Ferguson. 1991. Models for the Game of Liar’s Dice. Springer.
  8. Dynamic programming for partially observable stochastic games. Conference on Artificial Intelligence (AAAI) (2004).
  9. Adaptive Mixtures of Local Experts. Neural Computation 3, 1 (1991), 79–87.
  10. Contextual decision processes with low bellman rank are pac-learnable. In International Conference on Machine Learning. PMLR, 1704–1713.
  11. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. In Advances in Neural Information Processing Systems. Long Beach, CA, USA.
  12. MoE for Reinforcement Learning in Dialogue Systems. Dialogue Systems Research (2022).
  13. Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games. In Advances in Neural Information Processing Systems. NeurIPS, Shanghai Jiao Tong University; Netease Fuxi AI Lab; University College London.
  14. A Unified Diversity Measure for Multiagent Reinforcement Learning. In Advances in Neural Information Processing Systems.
  15. Human-level control through deep reinforcement learning. In Nature, Vol. 518. Nature Publishing Group, 529–533.
  16. Modelling Behavioural Diversity for Learning in Open-Ended Games. In Advances in Neural Information Processing Systems.
  17. Deep exploration via bootstrapped DQN. Advances in neural information processing systems 29 (2016).
  18. Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017).
  19. DeepEvap: Deep reinforcement learning based ensemble approach for estimating reference evapotranspiration. Applied Soft Computing 125 (2022), 109113.
  20. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. In International Conference on Machine Learning (ICML). PMLR, 1791–1800.
  21. DNS: Determinantal point process based neural network sampler for ensemble reinforcement learning. In International Conference on Machine Learning. PMLR, 19731–19746.
  22. Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement Learning. Arxiv (2021).
  23. Ensemble reinforcement learning: A survey. Applied Soft Computing (2023), 110975.
  24. Bayes’ bluff: Opponent modelling in poker. arXiv preprint arXiv:1207.1411 (2012).
  25. Approximate exploitability: Learning a best response in large games. arXiv preprint arXiv:2004.09677 (2020).
  26. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
  27. John Von Neumann and Oskar Morgenstern. 1944. Theory of Games and Economic Behavior. Princeton University Press.
  28. Policy Space Diversity for Non-Transitive Games. In Advances in Neural Information Processing Systems.
  29. Towards Playing Full MOBA Games with Deep Reinforcement Learning. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020). Vancouver, Canada.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets