TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy Gradient (2312.15667v3)
Abstract: Multi-Agent Policy Gradient (MAPG) has made significant progress in recent years. However, centralized critics in state-of-the-art MAPG methods still face the centralized-decentralized mismatch (CDM) issue, which means sub-optimal actions by some agents will affect other agent's policy learning. While using individual critics for policy updates can avoid this issue, they severely limit cooperation among agents. To address this issue, we propose an agent topology framework, which decides whether other agents should be considered in policy gradient and achieves compromise between facilitating cooperation and alleviating the CDM issue. The agent topology allows agents to use coalition utility as learning objective instead of global utility by centralized critics or local utility by individual critics. To constitute the agent topology, various models are studied. We propose Topology-based multi-Agent Policy gradiEnt (TAPE) for both stochastic and deterministic MAPG methods. We prove the policy improvement theorem for stochastic TAPE and give a theoretical explanation for the improved cooperation among agents. Experiment results on several benchmarks show the agent topology is able to facilitate agent cooperation and alleviate CDM issue respectively to improve performance of TAPE. Finally, multiple ablation studies and a heuristic graph search algorithm are devised to show the efficacy of the agent topology.
- Leveraging Communication Topologies Between Learning Agents in Deep Reinforcement Learning. arXiv preprint arXiv:1902.06740.
- Statistical mechanics of complex networks. Reviews of modern physics, 74(1): 47.
- Deep variational information bottleneck. arXiv preprint arXiv:1612.00410.
- Emergence of scaling in random networks. science, 286(5439): 509–512.
- Deep coordination graphs. In International Conference on Machine Learning, 980–991. PMLR.
- Learning Credit Assignment for Cooperative Reinforcement Learning. arXiv preprint arXiv:2210.05367.
- Tarmac: Targeted multi-agent communication. In International Conference on Machine Learning, 1538–1546. PMLR.
- Off-policy actor-critic. arXiv preprint arXiv:1205.4839.
- Learning individually inferred communication for multi-agent cooperation. Advances in Neural Information Processing Systems, 33: 22069–22079.
- LIIR: learning individual intrinsic reward in multi-agent reinforcement learning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), 4403–4414.
- A Review of Cooperation in Multi-agent Learning. arXiv preprint arXiv:2312.05162.
- Learning Correlated Communication Topology in Multi-Agent Reinforcement learning. In International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 456–464.
- On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1): 17–60.
- Model-based value estimation for efficient model-free reinforcement learning. arXiv preprint arXiv:1803.00101.
- Learning to communicate with deep multi-agent reinforcement learning. Advances in neural information processing systems, 29.
- Counterfactual multi-agent policy gradients. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
- Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Iterated reasoning with mutual information in cooperative and byzantine decentralized teaming. arXiv preprint arXiv:2201.08484.
- Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190: 82–94.
- Celebrating diversity in shared multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 34: 3991–4002.
- Deep implicit coordination graphs for multi-agent reinforcement learning. arXiv preprint arXiv:2006.11438.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
- Reinforcement learning applications in unmanned vehicle control: A comprehensive overview. Unmanned Systems, 1–10.
- PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination. arXiv preprint arXiv:2301.06387.
- Leveraging Joint-action Embedding in Multi-agent Reinforcement Learning for Cooperative Games. IEEE Transactions on Games.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30.
- Asynchronous methods for deep reinforcement learning. In International conference on machine learning, 1928–1937. PMLR.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
- Safe and efficient off-policy reinforcement learning. Advances in neural information processing systems, 29.
- Reinforcement learning in urban network traffic signal control: A systematic literature review. Expert Systems with Applications, 116830.
- A concise introduction to decentralized POMDPs. Springer.
- Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research, 32: 289–353.
- Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks (NeurIPS).
- Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems, 34: 12208–12221.
- Precup, D. 2000. Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series, 80.
- Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in neural information processing systems, 33: 10199–10210.
- Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 21(1): 7234–7284.
- GCS: graph-based coordination strategy for multi-agent reinforcement learning. arXiv preprint arXiv:2201.06257.
- The StarCraft Multi-Agent Challenge. CoRR, abs/1902.04043.
- Equivalence between policy gradients and soft q-learning. arXiv preprint arXiv:1704.06440.
- Deterministic policy gradient algorithms. In International conference on machine learning, 387–395. Pmlr.
- Collaborating with humans without human data. Advances in Neural Information Processing Systems, 34: 14502–14515.
- Reinforcement learning: An introduction. MIT press.
- The information bottleneck method. arXiv preprint physics/0004057.
- An experimental study of the small world problem. In Social networks, 179–197. Elsevier.
- Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062.
- Cross-Region Courier Displacement for On-Demand Delivery With Multi-Agent Reinforcement Learning. IEEE Transactions on Big Data.
- Rode: Learning roles to decompose multi-agent tasks. arXiv preprint arXiv:2010.01523.
- Learning nearly decomposable value functions via communication minimization. arXiv preprint arXiv:1910.05366.
- Off-policy multi-agent decomposed policy gradients. arXiv preprint arXiv:2007.12322.
- Collective dynamics of ‘small-world’networks. nature, 393(6684): 440–442.
- Optimal payoff functions for members of collectives. Advances in Complex Systems, 4(02n03): 265–279.
- Fully decentralized multi-agent reinforcement learning with networked agents. In International Conference on Machine Learning, 5872–5881. PMLR.
- Fop: Factorizing optimal joint policy of maximum-entropy multi-agent reinforcement learning. In International Conference on Machine Learning, 12491–12500. PMLR.
- PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2206.11420.
- Learning implicit credit assignment for cooperative multi-agent reinforcement learning. Advances in neural information processing systems, 33: 11853–11864.
- Xingzhou Lou (7 papers)
- Junge Zhang (47 papers)
- Timothy J. Norman (23 papers)
- Kaiqi Huang (60 papers)
- Yali Du (63 papers)