Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Discovering Causality for Efficient Cooperation in Multi-Agent Environments (2306.11846v1)

Published 20 Jun 2023 in cs.AI, cs.LG, cs.MA, and stat.ME

Abstract: In cooperative Multi-Agent Reinforcement Learning (MARL) agents are required to learn behaviours as a team to achieve a common goal. However, while learning a task, some agents may end up learning sub-optimal policies, not contributing to the objective of the team. Such agents are called lazy agents due to their non-cooperative behaviours that may arise from failing to understand whether they caused the rewards. As a consequence, we observe that the emergence of cooperative behaviours is not necessarily a byproduct of being able to solve a task as a team. In this paper, we investigate the applications of causality in MARL and how it can be applied in MARL to penalise these lazy agents. We observe that causality estimations can be used to improve the credit assignment to the agents and show how it can be leveraged to improve independent learning in MARL. Furthermore, we investigate how Amortized Causal Discovery can be used to automate causality detection within MARL environments. The results demonstrate that causality relations between individual observations and the team reward can be used to detect and punish lazy agents, making them develop more intelligent behaviours. This results in improvements not only in the overall performances of the team but also in their individual capabilities. In addition, results show that Amortized Causal Discovery can be used efficiently to find causal relations in MARL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Cooperative Multi-agent Control Using Deep Reinforcement Learning. In Gita Sukthankar and Juan A. Rodriguez-Aguilar, editors, Autonomous Agents and Multiagent Systems, volume 10642, pages 66–83. Springer International Publishing, Cham, 2017. ISBN 978-3-319-71681-7 978-3-319-71682-4. doi:10.1007/978-3-319-71682-4_5. URL http://link.springer.com/10.1007/978-3-319-71682-4_5. Series Title: Lecture Notes in Computer Science.
  2. Value-Decomposition Networks For Cooperative Multi-Agent Learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 2085– 2087, Stockholm, Sweden,, July 2018.
  3. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement learning. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 5887–5896, June 2019.
  4. QPLEX: Duplex Dueling Multi-Agent Q-Learning. In International Conference on Learning Representations, 2021. arXiv: 2008.01062.
  5. Optimal and approximate q-value functions for decentralized pomdps. J. Artif. Int. Res., 32(1):289–353, may 2008. ISSN 1076-9757.
  6. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016. ISSN 0925-2312. doi:https://doi.org/10.1016/j.neucom.2016.01.031. URL https://www.sciencedirect.com/science/article/pii/S0925231216000783.
  7. Multi-Agent Common Knowledge Reinforcement Learning. In Advances in Neural Information Processing Systems, pages 9924–9935, January 2020. arXiv: 1810.11702.
  8. Reward Machines for Cooperative Multi-Agent Reinforcement Learning. In Proc.of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021), May 2021. arXiv: 2007.01962.
  9. Dealing with non-stationarity in multi-agent deep reinforcement learning. CoRR, abs/1906.04737, 2019. URL http://arxiv.org/abs/1906.04737.
  10. Elements of Causal Inference. MIT Press, 12 2017.
  11. Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data. arXiv:2006.10833 [cs, stat], February 2022. URL http://arxiv.org/abs/2006.10833. arXiv: 2006.10833.
  12. Ming Tan. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In Proceedings of the Tenth International Conference on Machine Learning, pages 330–337, 1993.
  13. Technical Note Q,-Learning. In Machine Learning, volume 8, pages 279–292, 1992. URL https://www.gatsby.ucl.ac.uk/dayan/papers/cjch.pdf.
  14. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 4295–4304, July 2018. arXiv: 1803.11485.
  15. Liir: Learning individual intrinsic reward in multi-agent reinforcement learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/07a9d3fed4c5ea6b17e80258dee231fa-Paper.pdf.
  16. Counterfactual Multi-Agent Policy Gradients. In AAAI 2018: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, February 2018. arXiv: 1705.08926.
  17. Multi-Agent Reinforcement Learning: A Review of Challenges and Applications. Applied Sciences, 11(11):4948, May 2021. ISSN 2076-3417. doi:10.3390/app11114948. URL https://www.mdpi.com/2076-3417/11/11/4948.
  18. Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks. Applied Intelligence, January 2021. ISSN 0924-669X, 1573-7497. doi:10.1007/s10489-020-02034-2. URL http://link.springer.com/10.1007/s10489-020-02034-2.
  19. Marek Grzeundefined. Reward shaping in episodic reinforcement learning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’17, page 565–573, Richland, SC, 2017. International Foundation for Autonomous Agents and Multiagent Systems.
  20. Shapley q-value: A local reward approach to solve global reward games. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):7285–7292, Apr 2020. ISSN 2159-5399. doi:10.1609/aaai.v34i05.6220. URL http://dx.doi.org/10.1609/aaai.v34i05.6220.
  21. Learning Multiagent Communication with Backpropagation. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 2252–2260, 2016.
  22. Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 29, May 2016. arXiv: 1605.06676.
  23. Scaling multi-agent reinforcement learning with selective parameter sharing. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 1989–1998. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/christianos21a.html.
  24. C. W. J. Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3):424–438, 1969. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/1912791.
  25. Granger Causality: Theory and Applications, pages 83–111. Springer London, London, 2010. ISBN 978-1-84996-196-7. doi:10.1007/978-1-84996-196-7_5. URL https://doi.org/10.1007/978-1-84996-196-7_5.
  26. Granger causality analysis in neuroscience and neuroimaging. Journal of Neuroscience, 35(8):3293–3297, 2015. ISSN 0270-6474. doi:10.1523/JNEUROSCI.4399-14.2015. URL https://www.jneurosci.org/content/35/8/3293.
  27. Saurabh Khanna and Vincent Y. F. Tan. Economy statistical recurrent units for inferring nonlinear granger causality. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SyxV9ANFDH.
  28. Neural Relational Inference for Interacting Systems. arXiv:1802.04687 [cs, stat], June 2018. URL http://arxiv.org/abs/1802.04687. arXiv: 1802.04687.
  29. Review of causal discovery methods based on graphical models. Frontiers in Genetics, 10, 2019. ISSN 1664-8021. doi:10.3389/fgene.2019.00524. URL https://www.frontiersin.org/articles/10.3389/fgene.2019.00524.
  30. Causal discovery with reinforcement learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1g2skStPB.
  31. Causal discovery from incomplete data using an encoder and reinforcement learning, 2020. URL https://arxiv.org/abs/2006.05554.
  32. A Concise Introduction to Decentralized POMDPs. Springer Publishing Company, Incorporated, 1st edition, 2016.
  33. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015. ISSN 0028-0836, 1476-4687. doi:10.1038/nature14236. URL http://www.nature.com/articles/nature14236.
  34. Multiagent Cooperation and Competition with Deep Reinforcement Learning. arXiv:1511.08779 [cs, q-bio], November 2015. URL http://arxiv.org/abs/1511.08779. arXiv: 1511.08779.
  35. Julia M Rohrer. Thinking Clearly About Correlations and Causation: Graphical Causal Models for Observational Data. Advances in Methods and Practices in Psychological Science, page 16, 2018.
  36. Neural granger causality. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2021. doi:10.1109/tpami.2021.3065601. URL https://doi.org/10.1109%2Ftpami.2021.3065601.
  37. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=SJU4ayYgl.
  38. The StarCraft Multi-Agent Challenge. CoRR, abs/1902.04043, 2019.
  39. Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry, 36(8):1627–1639, July 1964. ISSN 0003-2700, 1520-6882. doi:10.1021/ac60214a047. URL https://pubs.acs.org/doi/abs/10.1021/ac60214a047.
  40. Causal multi-agent reinforcement learning: Review and open problems. CoRR, abs/2111.06721, 2021. URL https://arxiv.org/abs/2111.06721.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Rafael Pina (7 papers)
  2. Varuna De Silva (15 papers)
  3. Corentin Artaud (6 papers)
Citations (1)