MIXRTs: Toward Interpretable Multi-Agent Reinforcement Learning via Mixing Recurrent Soft Decision Trees (2209.07225v3)
Abstract: While achieving tremendous success in various fields, existing multi-agent reinforcement learning (MARL) with a black-box neural network architecture makes decisions in an opaque manner that hinders humans from understanding the learned knowledge and how input observations influence decisions. Instead, existing interpretable approaches, such as traditional linear models and decision trees, usually suffer from weak expressivity and low accuracy. To address this apparent dichotomy between performance and interpretability, our solution, MIXing Recurrent soft decision Trees (MIXRTs), is a novel interpretable architecture that can represent explicit decision processes via the root-to-leaf path and reflect each agent's contribution to the team. Specifically, we construct a novel soft decision tree to address partial observability by leveraging the advances in recurrent neural networks, and demonstrate which features influence the decision-making process through the tree-based model. Then, based on the value decomposition framework, we linearly assign credit to each agent by explicitly mixing individual action values to estimate the joint action value using only local observations, providing new insights into how agents cooperate to accomplish the task. Theoretical analysis shows that MIXRTs guarantees the structural constraint on additivity and monotonicity in the factorization of joint action values. Evaluations on the challenging Spread and StarCraft II tasks show that MIXRTs achieves competitive performance compared to widely investigated methods and delivers more straightforward explanations of the decision processes. We explore a promising path toward developing learning algorithms with both high performance and interpretability, potentially shedding light on new interpretable paradigms for MARL.
- O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
- P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls et al., “Value-decomposition networks for cooperative multi-agent learning based on team reward,” in Proceedings of the International Conference on Autonomous Agents and MultiAgent Systems, 2018, pp. 2085–2087.
- T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. N. Foerster, and S. Whiteson, “QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning,” in Proceedings of the International Conference on Machine Learning, vol. 80, 2018, pp. 4295–4304.
- C. Yu, X. Wang, X. Xu, M. Zhang, H. Ge, J. Ren, L. Sun, B. Chen, and G. Tan, “Distributed multiagent coordinated learning for autonomous driving in highways based on dynamic coordination graphs,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 2, pp. 735–748, 2019.
- B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. Pérez, “Deep reinforcement learning for autonomous driving: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4909–4926, 2021.
- I.-J. Liu, Z. Ren, R. A. Yeh, and A. G. Schwing, “Semantic tracklets: An object-centric representation for visual multi-agent reinforcement learning,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021, pp. 5603–5610.
- J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013.
- N. Topin, S. Milani, F. Fang, and M. Veloso, “Iterative bounding MDPs: Learning interpretable policies via non-interpretable methods,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 11, 2021, pp. 9923–9931.
- H. Liu, R. Wang, S. Shan, and X. Chen, “What is a tabby? Interpretable model decisions by learning attribute-based classification criteria,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1791–1807, 2019.
- B. Zhou, D. Bau, A. Oliva, and A. Torralba, “Interpreting deep visual representations via network dissection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 9, pp. 2131–2145, 2018.
- Z. C. Lipton, “The mythos of model interpretability,” Queue, vol. 16, no. 3, pp. 31–57, 2018.
- E. Tjoa and C. Guan, “A survey on explainable artificial intelligence (XAI): Toward medical XAI,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 11, pp. 4793–4813, 2020.
- C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, 2019.
- C. Rudin, C. Chen, Z. Chen, H. Huang, L. Semenova, and C. Zhong, “Interpretable machine learning: Fundamental principles and 10 grand challenges,” Statistics Surveys, vol. 16, pp. 1–85, 2022.
- X. Xu, Z. Wang, C. Deng, H. Yuan, and S. Ji, “Towards improved and interpretable deep metric learning via attentive grouping,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 1189–1200, 2022.
- Q. Cao, X. Liang, B. Li, and L. Lin, “Interpretable visual question answering by reasoning on dependency trees,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 3, pp. 887–901, 2019.
- M. Natarajan and M. Gombolay, “Effects of anthropomorphism and accountability on trust in human robot interaction,” in Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction, 2020, pp. 33–42.
- E. Puiutta and E. Veith, “Explainable reinforcement learning: A survey,” in Proceedings of the International Cross-Domain Conference for Machine Learning and Knowledge Extraction, 2020, pp. 77–95.
- W. Shi, G. Huang, S. Song, Z. Wang, T. Lin, and C. Wu, “Self-supervised discovering of interpretable features for reinforcement learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 5, pp. 2712–2724, 2020.
- T. Zahavy, N. Ben-Zrihem, and S. Mannor, “Graying the black box: Understanding DQNs,” in Proceedings of the International Conference on Machine Learning, vol. 48, 2016, pp. 1899–1908.
- A. Verma, “Verifiable and interpretable reinforcement learning through program synthesis,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, pp. 9902–9903.
- Z. Jiang and S. Luo, “Neural logic reinforcement learning,” in Proceedings of the International Conference on Machine Learning, vol. 97, 2019, pp. 3110–3119.
- W. Shi, G. Huang, S. Song, and C. Wu, “Temporal-spatial causal interpretations for vision-based reinforcement learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 10 222–10 235, 2021.
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
- W.-Y. Loh, “Classification and regression trees,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 14–23, 2011.
- L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
- N. Frosst and G. Hinton, “Distilling a neural network into a soft decision tree,” arXiv preprint arXiv:1711.09784, 2017.
- A. Silva, M. Gombolay, T. Killian, I. Jimenez, and S.-H. Son, “Optimization methods for interpretable differentiable decision trees applied to reinforcement learning,” in Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 108, 2020, pp. 1855–1865.
- A. Suárez and J. F. Lutsko, “Globally optimal fuzzy decision trees for classification and regression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 12, pp. 1297–1311, 1999.
- Y. Coppens, K. Efthymiadis, T. Lenaerts, A. Nowé, T. Miller, R. Weber, and D. Magazzeni, “Distilling deep reinforcement learning policies in soft decision trees,” in Proceedings of the IJCAI Workshop on Explainable Artificial Intelligence, 2019, pp. 1–6.
- Z. Ding, P. Hernandez-Leal, G. W. Ding, C. Li, and R. Huang, “CDT: Cascading decision trees for explainable reinforcement learning,” arXiv preprint arXiv:2011.07553, 2020.
- G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “OpenAI Gym,” arXiv preprint arXiv:1606.01540, 2016.
- M. Samvelyan, T. Rashid, C. S. De Witt, G. Farquhar, N. Nardelli, T. G. Rudner, C.-M. Hung, P. H. Torr, J. Foerster, and S. Whiteson, “The starcraft multi-agent challenge,” in Proceedings of the International Joint Conference on Autonomous Agents and MultiAgent Systems, 2019, pp. 2186–2188.
- J. Wang, Z. Ren, T. Liu, Y. Yu, and C. Zhang, “QPLEX: Duplex dueling multi-agent Q-learning,” in Proceedings of the International Conference on Learning Representations, 2021.
- C. J. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no. 3, pp. 279–292, 1992.
- K. Son, D. Kim, W. J. Kang, D. E. Hostallero, and Y. Yi, “QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning,” in Proceedings of the International Conference on Machine Learning, vol. 97, 2019, pp. 5887–5896.
- L. Pan, T. Rashid, B. Peng, L. Huang, and S. Whiteson, “Regularized softmax deep multi-agent Q-learning,” in Proceedings of the Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 1365–1377.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
- O. Irsoy, O. T. Yıldız, and E. Alpaydın, “Soft decision trees,” in Proceedings of the International Conference on Pattern Recognition, 2012, pp. 1819–1822.
- D. Laptev and J. M. Buhmann, “Convolutional decision trees for feature learning and segmentation,” in Proceedings of the German Conference on Pattern Recognition, 2014, pp. 95–106.
- A. M. Roth, N. Topin, P. Jamshidi, and M. Veloso, “Conservative Q-improvement: Reinforcement learning for an interpretable decision-tree policy,” arXiv preprint arXiv:1907.01180, 2019.
- A. Pace, A. Chan, and M. van der Schaar, “POETREE: Interpretable policy learning with adaptive decision trees,” in Proceedings of the International Conference on Learning Representations, 2021, pp. 1–28.
- M. Tan, “Multi-agent reinforcement learning: Independent vs cooperative agents,” in Proceedings of the International Conference on Machine Learning, 1993, pp. 330–337.
- F. A. Oliehoek, M. T. Spaan, and N. Vlassis, “Optimal and approximate Q-value functions for decentralized POMDPs,” Journal of Artificial Intelligence Research, vol. 32, pp. 289–353, 2008.
- L. Kraemer and B. Banerjee, “Multi-agent reinforcement learning as a rehearsal for decentralized planning,” Neurocomputing, vol. 190, pp. 82–94, 2016.
- M. Hausknecht and P. Stone, “Deep recurrent Q-learning for partially observable MDPs,” in Proceedings of the AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents, 2015, pp. 29–37.
- H. Hazimeh, N. Ponomareva, P. Mol, Z. Tan, and R. Mazumder, “The tree ensemble layer: Differentiability meets conditional computation,” in Proceedings of the International Conference on Machine Learning, vol. 119, 2020, pp. 4138–4148.
- P. Derbeko, R. El-Yaniv, and R. Meir, “Variance optimized bagging,” in Proceedings of the European Conference on Machine Learning, 2002, pp. 60–72.
- H. van Hasselt, “Double Q-learning,” in Proceedings of the Advances in Neural Information Processing Systems, vol. 23, 2010, pp. 2613–2621.
- H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double Q-learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1, 2016, pp. 2094–2100.
- R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in Proceedings of the Advances in Neural Information Processing Systems, vol. 30, 2017.
- A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,” PloS one, vol. 12, no. 4, pp. 1–15, 2017.
- A. Silva and M. Gombolay, “Encoding human domain knowledge to warm start reinforcement learning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 35, no. 6, 2021, pp. 5042–5050.
- Zichuan Liu (27 papers)
- Yuanyang Zhu (7 papers)
- Zhi Wang (261 papers)
- Yang Gao (761 papers)
- Chunlin Chen (53 papers)