Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization (2306.08900v1)

Published 15 Jun 2023 in cs.LG and cs.MA

Abstract: Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning. In International Conference on Learning Representations.
  2. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019).
  3. Offline rl without off-policy evaluation. Advances in Neural Information Processing Systems 34 (2021), 4933–4946.
  4. Duncan S Callaway and Ian A Hiskens. 2010. Achieving controllability of electric loads. Proc. IEEE 99, 1 (2010), 184–199.
  5. An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination. IEEE Transactions on Industrial Informatics 9, 1 (2013), 427–438.
  6. Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.
  7. A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems 33, 6 (2019), 750–797.
  8. Offline reinforcement learning with fisher divergence critic regularization. In International Conference on Machine Learning. PMLR, 5774–5783.
  9. Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169 (2021).
  10. Landon Kraemer and Bikramjit Banerjee. 2016. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190 (2016), 82–94.
  11. Settling the variance of multi-agent policy gradients. Advances in Neural Information Processing Systems 34 (2021), 13458–13470.
  12. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019).
  13. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.
  14. Distance-Sensitive Offline Reinforcement Learning. arXiv preprint arXiv:2205.11027 (2022).
  15. Offline pre-trained multi-agent decision transformer: One big sequence model conquers all starcraftii tasks. arXiv preprint arXiv:2112.02845 (2021).
  16. Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359 (2020).
  17. Frans A. Oliehoek and Christopher Amato. 2016. A Concise Introduction to Decentralized POMDPs.
  18. Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research 32, 1 (2008), 289–353.
  19. Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification. In International Conference on Machine Learning. PMLR, 17221–17237.
  20. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177 (2019).
  21. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In International Conference on Machine Learning. 4292–4301.
  22. The StarCraft Multi-Agent Challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2186–2188.
  23. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning.. In International Conference on Machine Learning. 5887–5896.
  24. Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017).
  25. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
  26. Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062 (2020).
  27. Off-policy multi-agent decomposed policy gradients. arXiv preprint arXiv:2007.12322 (2020).
  28. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361 (2019).
  29. Uncertainty weighted actor-critic for offline reinforcement learning. arXiv preprint arXiv:2105.08140 (2021).
  30. A Policy-Guided Imitation Approach for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems.
  31. Constraints penalized q-learning for safe offline reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8753–8760.
  32. Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 10299–10312.
  33. A Multi-Agent Framework for Packet Routing in Wireless Sensor Networks. Sensors 15, 5 (2015), 10026–10047.
  34. The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games. arXiv: Learning (2021).
  35. Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems 33 (2020), 14129–14142.
  36. DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
  37. Model-based offline planning with trajectory pruning. In Proceedings of the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence, IJCAI-ECAI 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Xiangsen Wang (3 papers)
  2. Xianyuan Zhan (47 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.