Reliable Graph Neural Network Explanations Through Adversarial Training (2106.13427v1)
Abstract: Graph neural network (GNN) explanations have largely been facilitated through post-hoc introspection. While this has been deemed successful, many post-hoc explanation methods have been shown to fail in capturing a model's learned representation. Due to this problem, it is worthwhile to consider how one might train a model so that it is more amenable to post-hoc analysis. Given the success of adversarial training in the computer vision domain to train models with more reliable representations, we propose a similar training paradigm for GNNs and analyze the respective impact on a model's explanations. In instances without ground truth labels, we also determine how well an explanation method is utilizing a model's learned representation through a new metric and demonstrate adversarial training can help better extract domain-relevant insights in chemistry.
- Donald Loveland (18 papers)
- Shusen Liu (29 papers)
- Bhavya Kailkhura (108 papers)
- Anna Hiszpanski (3 papers)
- Yong Han (28 papers)