Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning (2203.16464v3)

Published 30 Mar 2022 in cs.LG and cs.AI

Abstract: Artificial intelligence, particularly through recent advancements in deep learning, has achieved exceptional performances in many tasks in fields such as natural language processing and computer vision. In addition to desirable evaluation metrics, a high level of interpretability is often required for these models to be reliably utilized. Therefore, explanations that offer insight into the process by which a model maps its inputs onto its outputs are much sought-after. Unfortunately, the current black box nature of machine learning models is still an unresolved issue and this very nature prevents researchers from learning and providing explicative descriptions for a model's behavior and final predictions. In this work, we propose a novel framework utilizing Adversarial Inverse Reinforcement Learning that can provide global explanations for decisions made by a Reinforcement Learning model and capture intuitive tendencies that the model follows by summarizing the model's decision-making process.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Q. Chen, W. Wang, F. Wu, S. De, R. Wang, B. Zhang, and X. Huang, “A survey on an emerging area: Deep learning for smart city data,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 3, no. 5, pp. 392–410, 2019.
  2. Z. Chen, I. Li, H. Zhang, S. Preum, J. A. Stankovic, and M. Ma, “Cityspec: An intelligent assistant system for requirement specification in smart cities,” 2022. [Online]. Available: https://arxiv.org/abs/2206.03132
  3. R. K. Dass, N. Petersen, M. Omori, T. R. Lave, and U. Visser, “Detecting racial inequalities in criminal justice: towards an equitable deep learning approach for generating and interpreting racial categories using mugshots,” AI & SOCIETY, pp. 1–22, 2022.
  4. C. Rigano, “Using artificial intelligence to address criminal justice needs,” National Institute of Justice Journal, vol. 280, pp. 1–10, 2019.
  5. C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, 2019.
  6. J. Vamathevan, D. Clark, P. Czodrowski, I. Dunham, E. Ferran, G. Lee, B. Li, A. Madabhushi, P. Shah, M. Spitzer et al., “Applications of machine learning in drug discovery and development,” Nature reviews Drug discovery, vol. 18, no. 6, pp. 463–477, 2019.
  7. J. Jiménez-Luna, F. Grisoni, and G. Schneider, “Drug discovery with explainable artificial intelligence,” Nature Machine Intelligence, vol. 2, no. 10, pp. 573–584, 2020.
  8. M. P. McBee, O. A. Awan, A. T. Colucci, C. W. Ghobadi, N. Kadom, A. P. Kansagra, S. Tridandapani, and W. F. Auffermann, “Deep learning in radiology,” Academic radiology, vol. 25, no. 11, pp. 1472–1480, 2018.
  9. N. Coudray and A. Tsirigos, “Deep learning links histology, molecular signatures and prognosis in cancer,” Nature Cancer, vol. 1, no. 8, pp. 755–757, 2020.
  10. J. DiPalma, A. A. Suriawinata, L. J. Tafe, L. Torresani, and S. Hassanpour, “Resolution-based distillation for efficient histology image classification,” Artificial Intelligence in Medicine, vol. 119, p. 102136, 2021.
  11. M. A. Ahmad, C. Eckert, and A. Teredesai, “Interpretable machine learning in healthcare,” in Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics, 2018, pp. 559–560.
  12. Y. Xie, S. Vosoughi, and S. Hassanpour, “Interpretation quality score for measuring the quality of interpretability methods,” arXiv preprint arXiv:2205.12254, 2022.
  13. A. B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina, R. Benjamins et al., “Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai,” Information fusion, vol. 58, pp. 82–115, 2020.
  14. M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144.
  15. F. K. Došilović, M. Brčić, and N. Hlupić, “Explainable artificial intelligence: A survey,” in 2018 41st International convention on information and communication technology, electronics and microelectronics (MIPRO).   IEEE, 2018, pp. 0210–0215.
  16. B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” Robotics and autonomous systems, vol. 57, no. 5, pp. 469–483, 2009.
  17. J. Fu, K. Luo, and S. Levine, “Learning robust rewards with adversarial inverse reinforcement learning,” arXiv preprint arXiv:1710.11248, 2017.
  18. M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in International conference on machine learning.   PMLR, 2017, pp. 3319–3328.
  19. D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg, “Smoothgrad: removing noise by adding noise,” arXiv preprint arXiv:1706.03825, 2017.
  20. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
  21. J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim, “Sanity checks for saliency maps,” Advances in neural information processing systems, vol. 31, 2018.
  22. S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PloS one, vol. 10, no. 7, p. e0130140, 2015.
  23. A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” in International conference on machine learning.   PMLR, 2017, pp. 3145–3153.
  24. S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Advances in neural information processing systems, vol. 30, 2017.
  25. S. Xie, S. Vosoughi, and S. Hassanpour, “Proto-lm: A prototypical network-based framework for built-in interpretability in large language models,” arXiv preprint arXiv:2311.01732, 2023.
  26. H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec, “Interpretable & explorable approximations of black box models,” CoRR, vol. abs/1707.01154, 2017. [Online]. Available: http://arxiv.org/abs/1707.01154
  27. H. Deng, “Interpreting tree ensembles with intrees,” International Journal of Data Science and Analytics, vol. 7, no. 4, pp. 277–287, 2019.
  28. G. Balls, D. Palmer-Brown, and G. Sanders, “Investigating microclimatic influences on ozone injury in clover (trifolium subterraneum) using artificial neural networks,” New Phytologist, vol. 132, no. 2, pp. 271–280, 1996.
  29. E. Štrumbelj, I. Kononenko, and M. R. Šikonja, “Explaining instance classifications with interactions of subsets of feature values,” Data & Knowledge Engineering, vol. 68, no. 10, pp. 886–904, 2009.
  30. R. C. Fong and A. Vedaldi, “Interpretable explanations of black boxes by meaningful perturbation,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 3429–3437.
  31. J. D. Olden and D. A. Jackson, “Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks,” Ecological modelling, vol. 154, no. 1-2, pp. 135–150, 2002.
  32. L. M. Zintgraf, T. S. Cohen, T. Adel, and M. Welling, “Visualizing deep neural network decisions: Prediction difference analysis,” arXiv preprint arXiv:1702.04595, 2017.
  33. M. Ancona, E. Ceolini, C. Öztireli, and M. Gross, “Towards better understanding of gradient-based attribution methods for deep neural networks,” arXiv preprint arXiv:1711.06104, 2017.
  34. M. Pan, W. Huang, Y. Li, X. Zhou, and J. Luo, “xgail: Explainable generative adversarial imitation learning for explainable human decision analysis,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 1334–1343.
  35. I. Lage, D. Lifschitz, F. Doshi-Velez, and O. Amir, “Exploring computational user models for agent policy summarization,” in IJCAI: proceedings of the conference, vol. 28.   NIH Public Access, 2019, p. 1401.
  36. D. Amir and O. Amir, “Highlights: Summarizing agent behavior to people,” in Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018, pp. 1168–1176.
  37. N. Topin and M. Veloso, “Generation of policy-level explanations for reinforcement learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 2514–2521.
  38. A. M. Metelli, M. Pirotta, and M. Restelli, “Compatible reward inverse reinforcement learning,” Advances in neural information processing systems, vol. 30, 2017.
  39. A. H. Qureshi, B. Boots, and M. C. Yip, “Adversarial imitation via variational inverse reinforcement learning,” arXiv preprint arXiv:1809.06404, 2018.
  40. K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom, “Teaching machines to read and comprehend,” Advances in neural information processing systems, vol. 28, 2015.
  41. C.-Y. Lin, “Rouge: A package for automatic evaluation of summaries,” in Text summarization branches out, 2004, pp. 74–81.
  42. J. Schmidhuber, S. Hochreiter et al., “Long short-term memory,” Neural Comput, vol. 9, no. 8, pp. 1735–1780, 1997.
  43. R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for abstractive summarizatifon,” arXiv preprint arXiv:1705.04304, 2017.
  44. M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” arXiv preprint arXiv:1910.13461, 2019.
  45. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” arXiv preprint arXiv:1910.10683, 2019.
  46. H. Bao, L. Dong, F. Wei, W. Wang, N. Yang, X. Liu, Y. Wang, J. Gao, S. Piao, M. Zhou et al., “Unilmv2: Pseudo-masked language models for unified language model pre-training,” in International Conference on Machine Learning.   PMLR, 2020, pp. 642–652.
  47. M. Marcus, G. Kim, M. A. Marcinkiewicz, R. MacIntyre, A. Bies, M. Ferguson, K. Katz, and B. Schasberger, “The penn treebank: Annotating predicate argument structure,” in Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994, 1994.
  48. M. R. Zarei and M. Komeili, “Interpretable concept-based prototypical networks for few-shot learning,” 2022. [Online]. Available: https://arxiv.org/abs/2202.13474
  49. A. Asadulaev, I. Kuznetsov, and A. Filchenkov, “Linear distillation learning,” CoRR, vol. abs/1906.05431, 2019. [Online]. Available: http://arxiv.org/abs/1906.05431
  50. Y. Jian, C. Gao, and S. Vosoughi, “Embedding hallucination for few-shot language learning,” Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022.
  51. ——, “Contrastive learning for prompt-based few-shot language learners,” Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022.
  52. L. Zhang, Y. Chen, W. Wu, B. Wei, X. Luo, X. Chang, and J. Liu, “Interpretable few-shot learning with contrastive constraint,” Journal of Computer Research and Development, vol. 58, no. 12, p. 2573, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Sean Xie (4 papers)
  2. Soroush Vosoughi (90 papers)
  3. Saeed Hassanpour (43 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.