Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AttributionLab: Faithfulness of Feature Attribution Under Controllable Environments (2310.06514v2)

Published 10 Oct 2023 in cs.LG

Abstract: Feature attribution explains neural network outputs by identifying relevant input features. The attribution has to be faithful, meaning that the attributed features must mirror the input features that influence the output. One recent trend to test faithfulness is to fit a model on designed data with known relevant features and then compare attributions with ground truth input features.This idea assumes that the model learns to use all and only these designed features, for which there is no guarantee. In this paper, we solve this issue by designing the network and manually setting its weights, along with designing data. The setup, AttributionLab, serves as a sanity check for faithfulness: If an attribution method is not faithful in a controlled environment, it can be unreliable in the wild. The environment is also a laboratory for controlled experiments by which we can analyze attribution methods and suggest improvements.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Sanity checks for saliency maps. Advances in neural information processing systems, 31, 2018.
  2. Evaluating explainability for graph neural networks. Scientific Data, 10(1):144, 2023.
  3. Towards better understanding of gradient-based attribution methods for deep neural networks. In International Conference on Learning Representations, 2018.
  4. Clevr-xai: A benchmark dataset for the ground truth evaluation of neural network explanations. Information Fusion, 81:14–40, 2022. ISSN 1566-2535. doi: https://doi.org/10.1016/j.inffus.2021.11.008.
  5. Do deep nets really need to be deep? Advances in neural information processing systems, 27, 2014.
  6. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7), 2015.
  7. Leo Breiman. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3):199 – 231, 2001. doi: 10.1214/ss/1009213726. URL https://doi.org/10.1214/ss/1009213726.
  8. Language models are few-shot learners, 2020.
  9. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  10. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  9650–9660, 2021.
  11. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp.  839–847, 2018. doi: 10.1109/WACV.2018.00097.
  12. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp.  248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
  13. Efficient graph-based image segmentation. International journal of computer vision, 59:167–181, 2004.
  14. Understanding deep networks via extremal perturbations and smooth masks. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  2950–2958, 2019.
  15. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=rJl-b3RcF7.
  16. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  17. A benchmark for interpretability methods in deep neural networks. Advances in neural information processing systems, 32, 2019.
  18. Neural response interpretation through the lens of critical pathways. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13528–13538, 2021.
  19. Do explanations explain? model knows best. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10244–10253, 2022.
  20. The disagreement problem in explainable machine learning: A practitioner’s perspective, 2022.
  21. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, volume 25, 2012.
  22. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017.
  23. Methods for interpreting and understanding deep neural networks. Digital signal processing, 73:1–15, 2018.
  24. OpenAI. Gpt-4 technical report, 2023.
  25. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  26. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp.  1135–1144, 2016.
  27. A consistent and efficient evaluation strategy for attribution methods. In International Conference on Machine Learning, pp. 18770–18795. PMLR, 2022.
  28. Evaluating the visualization of what a deep neural network has learned. IEEE transactions on neural networks and learning systems, 28(11):2660–2673, 2016.
  29. Two4two: Evaluating interpretable machine learning - a synthetic dataset for controlled experiments, 2021.
  30. Restricting the flow: Information bottlenecks for attribution. In International Conference on Learning Representations, 2020.
  31. Grad-cam: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV), pp.  618–626, 2017. doi: 10.1109/ICCV.2017.74.
  32. Lloyd S Shapley. A value for n-person games. Contributions to the Theory of Games, 2(28):307–317, 1953.
  33. Learning important features through propagating activation differences. In International conference on machine learning, pp. 3145–3153. PMLR, 2017.
  34. K Simonyan and A Zisserman. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations (ICLR 2015), pp.  1–14, 2015.
  35. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013.
  36. When explanations lie: Why many modified bp attributions fail. In International Conference on Machine Learning, pp. 9046–9057. PMLR, 2020.
  37. Smoothgrad: removing noise by adding noise, 2017.
  38. Striving for simplicity: The all convolutional net. In ICLR (workshop track), 2015.
  39. Visualizing the impact of feature attribution baselines. Distill, 2020. doi: 10.23915/distill.00022. https://distill.pub/2020/attribution-baselines.
  40. The many shapley values for model explanation. 37th International Conference on Machine Learning, ICML 2020, 2020.
  41. Axiomatic attribution for deep networks. In International conference on machine learning, pp. 3319–3328. PMLR, 2017.
  42. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  43. Quick shift and kernel methods for mode seeking. In Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part IV 10, pp.  705–718. Springer, 2008.
  44. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
  45. Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13, pp. 818–833. Springer, 2014.
  46. Top-down neural attention by excitation backprop. International Journal of Computer Vision, 126(10):1084–1102, 2018.
  47. Fine-grained neural network explanation by identifying input features with predictive information. Advances in Neural Information Processing Systems, 34:20040–20051, 2021.
  48. Learning deep features for discriminative localization. In CVPR, pp.  2921–2929, 2016.
  49. Do feature attribution methods correctly attribute features? In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  9623–9633, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.