EDDA: Explanation-driven Data Augmentation to Improve Explanation Faithfulness (2105.14162v3)
Abstract: Recent years have seen the introduction of a range of methods for post-hoc explainability of image classifier predictions. However, these post-hoc explanations may not always be faithful to classifier predictions, which poses a significant challenge when attempting to debug models based on such explanations. To this end, we seek a methodology that can improve the faithfulness of an explanation method with respect to model predictions which does not require ground truth explanations. We achieve this through a novel explanation-driven data augmentation (EDDA) technique that augments the training data with occlusions inferred from model explanations; this is based on the simple motivating principle that \emph{if} the explainer is faithful to the model \emph{then} occluding salient regions for the model prediction should decrease the model confidence in the prediction, while occluding non-salient regions should not change the prediction. To verify that the proposed augmentation method has the potential to improve faithfulness, we evaluate EDDA using a variety of datasets and classification models. We demonstrate empirically that our approach leads to a significant increase of faithfulness, which can facilitate better debugging and successful deployment of image classification models in real-world applications.
- Ruiwen Li (13 papers)
- Zhibo Zhang (33 papers)
- Jiani Li (11 papers)
- Chiheb Trabelsi (7 papers)
- Scott Sanner (70 papers)
- Jongseong Jang (13 papers)
- Yeonjeong Jeong (5 papers)
- Dongsub Shim (11 papers)