Decision-making with Speculative Opponent Models (2211.11940v3)
Abstract: Opponent modelling has proven effective in enhancing the decision-making of the controlled agent by constructing models of opponent agents. However, existing methods often rely on access to the observations and actions of opponents, a requirement that is infeasible when such information is either unobservable or challenging to obtain. To address this issue, we introduce Distributional Opponent-aided Multi-agent Actor-Critic (DOMAC), the first speculative opponent modelling algorithm that relies solely on local information (i.e., the controlled agent's observations, actions, and rewards). Specifically, the actor maintains a speculated belief about the opponents using the tailored speculative opponent models that predict the opponents' actions using only local information. Moreover, DOMAC features distributional critic models that estimate the return distribution of the actor's policy, yielding a more fine-grained assessment of the actor's quality. This thus more effectively guides the training of the speculative opponent models that the actor depends upon. Furthermore, we formally derive a policy gradient theorem with the proposed opponent models. Extensive experiments under eight different challenging multi-agent benchmark tasks within the MPE, Pommerman and StarCraft Multiagent Challenge (SMAC) demonstrate that our DOMAC successfully models opponents' behaviours and delivers superior performance against state-of-the-art methods with a faster convergence speed.
- Reasoning about Hypothetical Agent Behaviours and their Parameters. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, 547–555.
- Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence, 258: 66–95.
- PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm. arXiv preprint arXiv:2306.06637.
- Distributed Distributional Deterministic Policy Gradients. In International Conference on Learning Representations.
- A distributional perspective on reinforcement learning. In International Conference on Machine Learning, 449–458. PMLR.
- Distributional reinforcement learning. MIT Press.
- Brown, G. W. 1951. Iterative solution of games by fictitious play. Activity analysis of production and allocation, 13(1): 374–376.
- The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI, 1998(746-752): 2.
- A distributional code for value in dopamine-based reinforcement learning. Nature, 577(7792): 671–675.
- Implicit quantile networks for distributional reinforcement learning. In International conference on machine learning, 1096–1105. PMLR.
- Distributional reinforcement learning with quantile regression. In Thirty-Second AAAI Conference on Artificial Intelligence.
- Thompson sampling for Markov games with piecewise stationary opponent policies. In Uncertainty in Artificial Intelligence, 738–748. PMLR.
- Learning with Opponent-Learning Awareness. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 122–130.
- Counterfactual multi-agent policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
- Learning with opponent-learning awareness. arXiv preprint arXiv:1709.04326.
- Greedy when Sure and Conservative when Uncertain about the Opponents. In International Conference on Machine Learning, 6829–6848. PMLR.
- Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, 1–49.
- Learning policy representations in multiagent systems. In International conference on machine learning, 1802–1811. PMLR.
- Opponent modeling in deep reinforcement learning. In International conference on machine learning, 1804–1813. PMLR.
- A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems, 33(6): 750–797.
- A Deep Policy Inference Q-Network for Multi-Agent Systems. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 1388–1396.
- Qr-mix: Distributional value function factorisation for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2009.04197.
- A Distributional Perspective on Multiagent Cooperation With Deep Reinforcement Learning. IEEE Transactions on Neural Networks and Learning Systems.
- Actor-attention-critic for multi-agent reinforcement learning. In International Conference on Machine Learning, 2961–2970. PMLR.
- A policy gradient algorithm for learning to learn in multiagent reinforcement learning. In International Conference on Machine Learning, 5541–5550. PMLR.
- Adam: A Method for Stochastic Optimization. In ICLR (Poster).
- Quantile regression. Journal of economic perspectives, 15(4): 143–156.
- Actor-critic algorithms. In Advances in neural information processing systems, 1008–1014.
- On information and sufficiency. The annals of mathematical statistics, 22(1): 79–86.
- Littman, M. L. 1994. Markov games as a framework for multi-agent reinforcement learning. In Machine learning proceedings 1994, 157–163. Elsevier.
- Multi-Agent Interactions Modeling with Correlated Policies. In International Conference on Learning Representations.
- Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30.
- Likelihood Quantile Networks for Coordinating Multi-Agent Reinforcement Learning. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
- Dsac: Distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:2004.14547.
- Asynchronous methods for deep reinforcement learning. In International conference on machine learning, 1928–1937. PMLR.
- Agent Modelling under Partial Observability for Deep Reinforcement Learning. Advances in Neural Information Processing Systems, 34.
- Local Information Opponent Modelling Using Variational Autoencoders. arXiv preprint arXiv:2006.09447.
- Benchmarking multi-agent deep reinforcement learning algorithms in cooperative tasks. arXiv preprint arXiv:2006.07869.
- Automatic differentiation in pytorch.
- Machine theory of mind. In International conference on machine learning, 4218–4227. PMLR.
- Modeling others using oneself in multi-agent reinforcement learning. In International conference on machine learning, 4257–4266. PMLR.
- Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, 4295–4304. PMLR.
- Pommerman: A Multi-Agent Playground. CoRR, abs/1809.07124.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043.
- Deterministic policy gradient algorithms. In International conference on machine learning, 387–395. PMLR.
- Sample-based distributional policy gradient. arXiv preprint arXiv:2001.02652.
- Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International conference on machine learning, 5887–5896. PMLR.
- DFAC framework: Factorizing the value function via quantile mixture for multi-agent distributional q-learning. In International Conference on Machine Learning, 9945–9954. PMLR.
- Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296.
- Reinforcement learning: An introduction. MIT press.
- Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, 1057–1063.
- Distributional policy optimization: An alternative approach for continuous control. Advances in Neural Information Processing Systems, 32: 1352–1362.
- A regularized opponent model with maximum entropy objective. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 602–608.
- Qplex: Duplex dueling multi-agent q-learning. arXiv preprint arXiv:2008.01062.
- RODE: learning roles to decompose multi- agent tasks. In Proceedings of the International Conference on Learning Representations. OpenReview.
- Modelling bounded rationality in multi-agent interactions by generalized recursive reasoning. arXiv preprint arXiv:1901.09216.
- Probabilistic recursive reasoning for multi-agent reinforcement learning. In 7th International Conference on Learning Representations, ICLR 2019.
- Distributional Hamilton-jacobi-Bellman equations for continuous-time reinforcement learning. In International Conference on Machine Learning, 23832–23856. PMLR.
- L2E: Learning to exploit your opponent. In 2022 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE.
- Towards Efficient Detection and Optimal Response against Sophisticated Opponents. In IJCAI.
- Towards efficient detection and optimal response against sophisticated opponents. arXiv preprint arXiv:1809.04240.
- The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems, 35: 24611–24624.
- A review of deep reinforcement learning for smart building energy management. IEEE Internet of Things Journal, 8(15): 12046–12063.
- Implicit distributional reinforcement learning. Advances in Neural Information Processing Systems, 33: 7135–7147.
- Distributional GFlowNets with Quantile Flows. arXiv preprint arXiv:2302.05793.
- Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, 321–384.
- Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems, 31.
- A deep bayesian policy reuse approach against non-stationary agents. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 962–972.
- Deep Interactive Bayesian Reinforcement Learning via Meta-Learning. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, 1712–1714.
- Jing Sun (115 papers)
- Shuo Chen (127 papers)
- Cong Zhang (121 papers)
- Yining Ma (31 papers)
- Jie Zhang (847 papers)