Interpretable Concept-Based Memory Reasoning (2407.15527v2)
Abstract: The lack of transparency in the decision-making processes of deep learning systems presents a significant challenge in modern AI, as it impairs users' ability to rely on and verify these systems. To address this challenge, Concept Bottleneck Models (CBMs) have made significant progress by incorporating human-interpretable concepts into deep learning architectures. This approach allows predictions to be traced back to specific concept patterns that users can understand and potentially intervene on. However, existing CBMs' task predictors are not fully interpretable, preventing a thorough analysis and any form of formal verification of their decision-making process prior to deployment, thereby raising significant reliability concerns. To bridge this gap, we introduce Concept-based Memory Reasoner (CMR), a novel CBM designed to provide a human-understandable and provably-verifiable task prediction process. Our approach is to model each task prediction as a neural selection mechanism over a memory of learnable logic rules, followed by a symbolic evaluation of the selected rule. The presence of an explicit memory and the symbolic evaluation allow domain experts to inspect and formally verify the validity of certain global properties of interest for the task prediction process. Experimental results demonstrate that CMR achieves better accuracy-interpretability trade-offs to state-of-the-art CBMs, discovers logic rules consistent with ground truths, allows for rule interventions, and allows pre-deployment verification.
- A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5):1–42, 2018.
- Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE access, 6:52138–52160, 2018.
- Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Information fusion, 58:82–115, 2020.
- Concept bottleneck models. In International conference on machine learning, pages 5338–5348. PMLR, 2020.
- Towards robust interpretability with self-explaining neural networks. Advances in neural information processing systems, 31, 2018.
- Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2(12):772–782, 2020.
- Towards robust metrics for concept representation evaluation. arXiv preprint arXiv:2301.10367, 2023.
- Probabilistic concept bottleneck models. arXiv preprint arXiv:2306.01574, 2023.
- Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5):206–215, 2019.
- Concept-based explainable artificial intelligence: A survey. arXiv preprint arXiv:2312.12936, 2023.
- Interpretable neural-symbolic concept reasoning. In ICML, 2023.
- Memory networks. arXiv preprint arXiv:1410.3916, 2014.
- Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.
- Eleanor Rosch. Principles of categorization. In Cognition and categorization, pages 27–48. Routledge, 1978.
- Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistic Surveys, 16:1–85, 2022.
- Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems, 32, 2019.
- Semantic-based regularization for learning and inference. Artificial Intelligence, 244:143–165, 2017.
- Counterfactual fairness. Advances in neural information processing systems, 30, 2017.
- James Cussens. Parameter estimation in stochastic logic programs. Machine Learning, 44:245–271, 2001.
- Deepstochlog: Neural stochastic logic programming. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):10090–10100, Jun. 2022. doi: 10.1609/aaai.v36i9.21248. URL https://ojs.aaai.org/index.php/AAAI/article/view/21248.
- DeepProbLog: Neural Probabilistic Logic Programming. In NeurIPS, pages 3753–3763, 2018.
- Deep learning face attributes in the wild. In ICCV, 2015.
- Caltech-ucsd birds 200. Technical Report CNS-TR-201, Caltech, 2010. URL /se3/wp-content/uploads/2014/09/WelinderEtal10_CUB-200.pdf,http://www.vision.caltech.edu/visipedia/CUB-200.html.
- Cebab: Estimating the causal effects of real-world concepts on nlp model behavior. Advances in Neural Information Processing Systems, 35:17582–17596, 2022.
- A simple generalisation of the area under the roc curve for multiple class classification problems. Mach. Learn., 45(2):171–186, oct 2001. ISSN 0885-6125. doi: 10.1023/A:1010920819831. URL https://doi.org/10.1023/A:1010920819831.
- Concept embedding models: Beyond the accuracy-explainability trade-off. Advances in Neural Information Processing Systems, 35, 2022.
- Glancenets: Interpretable, leak-proof concept-based models. Advances in Neural Information Processing Systems, 35:21212–21227, 2022.
- Addressing leakage in concept bottleneck models. Advances in Neural Information Processing Systems, 35:23386–23397, 2022.
- The (un) reliability of saliency methods. Explainable AI: Interpreting, explaining and visualizing deep learning, pages 267–280, 2019.
- Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 3681–3688, 2019a.
- Sanity checks for saliency maps. Advances in neural information processing systems, 31, 2018.
- Manipulating and measuring model interpretability. In Proceedings of the 2021 CHI conference on human factors in computing systems, pages 1–52, 2021.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018.
- Towards automatic concept-based explanations. Advances in neural information processing systems, 32, 2019b.
- Neural-symbolic learning and reasoning: A survey and interpretation. Neuro-Symbolic Artificial Intelligence: The State of the Art, 342(1):327, 2022.
- From statistical relational to neurosymbolic artificial intelligence: A survey. Artificial Intelligence, page 104062, 2024.
- A semantic loss function for deep learning with symbolic knowledge. In International conference on machine learning, pages 5502–5511. PMLR, 2018.
- Logic tensor networks. Artificial Intelligence, 303:103649, 2022.
- Lifted relational neural networks: Efficient learning of latent relational structures. Journal of Artificial Intelligence Research, 62:69–100, 2018.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6.
- Pytorch: An imperative style, high-performance deep learning library, 2019.
- Scikit-learn: Machine learning in python, 2018.
- J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007. doi: 10.1109/MCSE.2007.55.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.