ConcaveQ: Non-Monotonic Value Function Factorization via Concave Representations in Deep Multi-Agent Reinforcement Learning (2312.15555v1)
Abstract: Value function factorization has achieved great success in multi-agent reinforcement learning by optimizing joint action-value functions through the maximization of factorized per-agent utilities. To ensure Individual-Global-Maximum property, existing works often focus on value factorization using monotonic functions, which are known to result in restricted representation expressiveness. In this paper, we analyze the limitations of monotonic factorization and present ConcaveQ, a novel non-monotonic value function factorization approach that goes beyond monotonic mixing functions and employs neural network representations of concave mixing functions. Leveraging the concave property in factorization, an iterative action selection scheme is developed to obtain optimal joint actions during training. It is used to update agents' local policy networks, enabling fully decentralized execution. The effectiveness of the proposed ConcaveQ is validated across scenarios involving multi-agent predator-prey environment and StarCraft II micromanagement tasks. Empirical results exhibit significant improvement of ConcaveQ over state-of-the-art multi-agent reinforcement learning approaches.
- Deep coordination graphs. In PMLR.
- Convex Analysis and Nonlinear Optimization: Theory and Examples. Springer.
- Convex Optimization. Cambridge university press.
- Multi-agent covering option discovery based on kronecker product of factor graphs. IEEE Transactions on Artificial Intelligence.
- Scalable Multi-agent Covering Option Discovery based on Kronecker Graphs. In Advances in Neural Information Processing Systems, volume 35, 30406–30418. Curran Associates, Inc.
- Minimizing return gaps with discrete communications in decentralized pomdp. arXiv preprint arXiv:2308.03358.
- Czumaj, A. 2004. Lecture notes in Approximation and Randomized Algorithms. https://www.ic.unicamp.br/~celio/peer2peer/math/czumaj-balls-into-bins.pdf.
- Counterfactual Multi-Agent Policy Gradients. In AAAI.
- Accmer: Accelerating multi-agent experience replay with cache locality-aware prioritization. In 2023 IEEE 34th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 205–212. IEEE.
- Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. In ICLR.
- Actor-Attention-Critic for Multi-Agent Reinforcement Learning. In ICML.
- Karush, W. 1939. Minima of Functions of Several Variables with Inequalities as Side Conditions. Master’s thesis, Department of Mathematics, University of Chicago.
- Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In NeuralIPS.
- ReMIX: Regret Minimization for Monotonic Value Function Factorization in Multiagent Reinforcement Learning. arXiv preprint arXiv:2302.05593.
- MAC-PO: Multi-Agent Experience Replay via Collective Priority Optimization. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’23, 466–475. International Foundation for Autonomous Agents and Multiagent Systems.
- Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In NeuralIPS.
- QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In ICML.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043.
- ResQ: A Residual Q Function-based Approach for Multi-Agent Reinforcement Learning Value Factorization. In NeurIPS.
- QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In ICML.
- Value-decomposition multi-agent actor-critics. In AAAI.
- Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward. In AAMAS.
- QPLEX: Duplex Dueling Multi-Agent Q-Learning. In ICLR.
- DOP: Off-Policy Multi-Agent Decomposed Policy Gradients. In ICLR.
- Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939.
- FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning. In ICML.
- PAC: Assisted Value Factorization with Counterfactual Predictions in Multi-Agent Reinforcement Learning. In NeurlIPS.
- Value Functions Factorization With Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients. IEEE Transactions on Emerging Topics in Computational Intelligence, 7(5): 1351–1361.
- Huiqun Li (1 paper)
- Hanhan Zhou (14 papers)
- Yifei Zou (18 papers)
- Dongxiao Yu (60 papers)
- Tian Lan (162 papers)