Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization (2306.08900v1)
Abstract: Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existing methods directly apply offline RL ingredients in the multi-agent setting without fully leveraging the decomposable problem structure, leading to less satisfactory performance in complex tasks. We present OMAC, a new offline multi-agent RL algorithm with coupled value factorization. OMAC adopts a coupled value factorization scheme that decomposes the global value function into local and shared components, and also maintains the credit assignment consistency between the state-value and Q-value functions. Moreover, OMAC performs in-sample learning on the decomposed local state-value functions, which implicitly conducts max-Q operation at the local level while avoiding distributional shift caused by evaluating out-of-distribution actions. Based on the comprehensive evaluations of the offline multi-agent StarCraft II micro-management tasks, we demonstrate the superior performance of OMAC over the state-of-the-art offline multi-agent RL methods.
- Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning. In International Conference on Learning Representations.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019).
- Offline rl without off-policy evaluation. Advances in Neural Information Processing Systems 34 (2021), 4933–4946.
- Duncan S Callaway and Ian A Hiskens. 2010. Achieving controllability of electric loads. Proc. IEEE 99, 1 (2010), 184–199.
- An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination. IEEE Transactions on Industrial Informatics 9, 1 (2013), 427–438.
- Off-policy deep reinforcement learning without exploration. In International conference on machine learning. PMLR, 2052–2062.
- A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems 33, 6 (2019), 750–797.
- Offline reinforcement learning with fisher divergence critic regularization. In International Conference on Machine Learning. PMLR, 5774–5783.
- Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169 (2021).
- Landon Kraemer and Bikramjit Banerjee. 2016. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing 190 (2016), 82–94.
- Settling the variance of multi-agent policy gradients. Advances in Neural Information Processing Systems 34 (2021), 13458–13470.
- Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019).
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.
- Distance-Sensitive Offline Reinforcement Learning. arXiv preprint arXiv:2205.11027 (2022).
- Offline pre-trained multi-agent decision transformer: One big sequence model conquers all starcraftii tasks. arXiv preprint arXiv:2112.02845 (2021).
- Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359 (2020).
- Frans A. Oliehoek and Christopher Amato. 2016. A Concise Introduction to Decentralized POMDPs.
- Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research 32, 1 (2008), 289–353.
- Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification. In International Conference on Machine Learning. PMLR, 17221–17237.
- Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177 (2019).
- QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In International Conference on Machine Learning. 4292–4301.
- The StarCraft Multi-Agent Challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2186–2188.
- QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning.. In International Conference on Machine Learning. 5887–5896.
- Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017).
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350–354.
- Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062 (2020).
- Off-policy multi-agent decomposed policy gradients. arXiv preprint arXiv:2007.12322 (2020).
- Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361 (2019).
- Uncertainty weighted actor-critic for offline reinforcement learning. arXiv preprint arXiv:2105.08140 (2021).
- A Policy-Guided Imitation Approach for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems.
- Constraints penalized q-learning for safe offline reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8753–8760.
- Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning. Advances in Neural Information Processing Systems 34 (2021), 10299–10312.
- A Multi-Agent Framework for Packet Routing in Wireless Sensor Networks. Sensors 15, 5 (2015), 10026–10047.
- The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games. arXiv: Learning (2021).
- Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems 33 (2020), 14129–14142.
- DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
- Model-based offline planning with trajectory pruning. In Proceedings of the 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence, IJCAI-ECAI 2022.
- Xiangsen Wang (3 papers)
- Xianyuan Zhan (47 papers)