Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Labeling Neural Representations with Inverse Recognition (2311.13594v2)

Published 22 Nov 2023 in cs.LG, cs.AI, and stat.ML

Abstract: Deep Neural Networks (DNNs) demonstrate remarkable capabilities in learning complex hierarchical data representations, but the nature of these representations remains largely unknown. Existing global explainability methods, such as Network Dissection, face limitations such as reliance on segmentation masks, lack of statistical significance testing, and high computational demands. We propose Inverse Recognition (INVERT), a scalable approach for connecting learned representations with human-understandable concepts by leveraging their capacity to discriminate between these concepts. In contrast to prior work, INVERT is capable of handling diverse types of neurons, exhibits less computational complexity, and does not rely on the availability of segmentation masks. Moreover, INVERT provides an interpretable metric assessing the alignment between the representation and its corresponding explanation and delivering a measure of statistical significance. We demonstrate the applicability of INVERT in various scenarios, including the identification of representations affected by spurious correlations, and the interpretation of the hierarchical structure of decision-making within the models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
  2. Unmasking clever hans predictors and assessing what machines really learn. Nature communications, 10:1096, 2019.
  3. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
  4. Finding spurious correlations with function-semantic contrast analysis. In Luca Longo, editor, Explainable Artificial Intelligence, pages 549–572, Cham, 2023. Springer Nature Switzerland.
  5. Underdiagnosis Bias of Artificial Intelligence Algorithms Applied to Chest Radiographs in Under-Served Patient Populations. Nature Medicine, 27(12):2176–2182, December 2021.
  6. Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale, November 2022.
  7. Safety concerns and mitigation approaches regarding the use of deep learning in safety-critical perception tasks. In Computer Safety, Reliability, and Security. SAFECOMP 2020 Workshops: DECSoS 2020, DepDevOps 2020, USDAI 2020, and WAISE 2020, Lisbon, Portugal, September 15, 2020, Proceedings 39, pages 336–350. Springer, 2020.
  8. Explainable AI: interpreting, explaining and visualizing deep learning, volume 11700. Springer Nature, 2019.
  9. Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA), pages 80–89. IEEE, 2018.
  10. Explainable ai: A brief survey on history, research areas, approaches and challenges. In Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part II 8, pages 563–574. Springer, 2019.
  11. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6541–6549, 2017.
  12. Compositional explanations of neurons. Advances in Neural Information Processing Systems, 33:17153–17163, 2020.
  13. Clip-dissect: Automatic description of neuron representations in deep vision networks. arXiv preprint arXiv:2204.10965, 2022.
  14. Exaid: A multimodal explanation framework for computer-aided diagnosis of skin lesions. Computer Methods and Programs in Biomedicine, 215:106620, 2022.
  15. Visualizing higher-layer features of a deep network. University of Montreal, 1341(3):1, 2009.
  16. Natural images are more informative for interpreting cnn activations than state-of-the-art synthetic feature visualizations. In NeurIPS 2020 Workshop SVRHM, 2020.
  17. Feature visualization. Distill, 2(11):e7, 2017.
  18. Unlocking feature visualization for deep network with MAgnitude constrained optimization. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  19. DORA: Exploring outlier representations in deep neural networks. Transactions on Machine Learning Research, 2023.
  20. Multimodal neurons in artificial neural networks. Distill, 6(3):e30, 2021.
  21. Explainable artificial intelligence (xai): What we know and what is left to attain trustworthy artificial intelligence. Information Fusion, 99:101805, 2023.
  22. Explainable artificial intelligence (xai) post-hoc explainability methods: Risks and limitations in non-discrimination law. AI and Ethics, 2(4):815–826, 2022.
  23. How to explain individual classification decisions. Journal of Machine Learning Research, 11(Jun):1803–1831, 2010.
  24. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7):e0130140, 2015.
  25. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017.
  26. Explaining deep neural networks: A survey on the global interpretation methods. Neurocomputing, 2022.
  27. One explanation does not fit all: A toolkit and taxonomy of ai explainability techniques. arXiv preprint arXiv:1909.03012, 2019.
  28. Receptive fields of single neurones in the cat’s striate cortex. The Journal of physiology, 148(3):574, 1959.
  29. Vernon B Mountcastle. Modality and topographic properties of single neurons of cat’s somatic sensory cortex. Journal of neurophysiology, 20(4):408–434, 1957.
  30. The hippocampus as a spatial map: preliminary evidence from unit activity in the freely-moving rat. Brain research, 1971.
  31. Zoom in: An introduction to circuits. Distill, 5(3):e00024–001, 2020.
  32. Emergent world representations: Exploring a sequence model trained on a synthetic task. arXiv preprint arXiv:2210.13382, 2022.
  33. Progress measures for grokking via mechanistic interpretability. arXiv preprint arXiv:2301.05217, 2023.
  34. A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021.
  35. Language models can explain neurons in language models. URL https://openaipublic. blob. core. windows. net/neuron-explainer/paper/index. html.(Date accessed: 14.05. 2023), 2023.
  36. Multi-dimensional concept discovery (mcd): A unifying framework with completeness guarantees. Transactions on Machine Learning Research, 2023.
  37. Disentangling neuron representations with concept vectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3769–3774, 2023.
  38. Towards automatic concept-based explanations. Advances in neural information processing systems, 32, 2019.
  39. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018.
  40. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. arXiv preprint arXiv:2211.00593, 2022.
  41. Thread: Circuits. Distill, 2020. https://distill.pub/2020/circuits.
  42. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Advances in neural information processing systems, 29, 2016.
  43. Red teaming deep neural networks with feature synthesis tools. arXiv preprint arXiv:2302.10894, 2023.
  44. Manipulating feature visualizations with gradient slingshots, 2024.
  45. Don’t trust your eyes: on the (un) reliability of feature visualizations. arXiv preprint arXiv:2306.04719, 2023.
  46. Adversarial attacks on feature visualization methods. In NeurIPS ML Safety Workshop, 2022.
  47. GAN dissection: Visualizing and understanding generative adversarial networks. arXiv preprint arXiv:1811.10597, 2018.
  48. Natural language descriptions of deep visual features. In International Conference on Learning Representations, 2022.
  49. Identifying interpretable subspaces in image representations. 2023.
  50. Supervised learning. In Machine learning techniques for multimedia, pages 21–49. Springer, 2008.
  51. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  52. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  53. George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
  54. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1):29–36, 1982.
  55. Confidence intervals for the area under the roc curve. Advances in neural information processing systems, 17, 2004.
  56. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
  57. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
  58. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  59. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  60. Microsoft COCO: Common objects in context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing.
  61. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
  62. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303–338, June 2010.
  63. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
  64. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
  65. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  66. Mark my words: Dangers of watermarked images in imagenet. arXiv preprint arXiv:2303.05498, 2023.
  67. A whac-a-mole dilemma: Shortcuts come in multiples where mitigating one amplifies others. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20071–20082, 2023.
  68. From attribution maps to human-understandable explanations through concept relevance propagation. Nature Machine Intelligence, 5(9):1006–1019, 2023.
  69. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
  70. Analyzing differentiable fuzzy logic operators. Artificial Intelligence, 302:103602, 2022.
  71. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
  72. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pages 722–729. IEEE, 2008.
  73. Large-scale celebfaces attributes (celeba) dataset. Retrieved August, 15(2018):11, 2018.
  74. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
  75. Caltech 101, apr 2022.
  76. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Kirill Bykov (11 papers)
  2. Laura Kopf (3 papers)
  3. Shinichi Nakajima (44 papers)
  4. Marius Kloft (65 papers)
  5. Marina M. -C. Höhne (22 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.