Towards Compositional Interpretability for XAI (2406.17583v1)

Published 25 Jun 2024 in cs.AI, cs.LG, cs.LO, and math.CT

Abstract: AI is currently based largely on black-box machine learning models which lack interpretability. The field of eXplainable AI (XAI) strives to address this major concern, being critical in high-stakes areas such as the finance, legal and health sectors. We present an approach to defining AI models and their interpretability based on category theory. For this we employ the notion of a compositional model, which sees a model in terms of formal string diagrams which capture its abstract structure together with its concrete implementation. This comprehensive view incorporates deterministic, probabilistic and quantum models. We compare a wide range of AI models as compositional models, including linear and rule-based models, (recurrent) neural networks, transformers, VAEs, and causal and DisCoCirc models. Next we give a definition of interpretation of a model in terms of its compositional structure, demonstrating how to analyse the interpretability of a model, and using this to clarify common themes in XAI. We find that what makes the standard 'intrinsically interpretable' models so transparent is brought out most clearly diagrammatically. This leads us to the more general notion of compositionally-interpretable (CI) models, which additionally include, for instance, causal, conceptual space, and DisCoCirc models. We next demonstrate the explainability benefits of CI models. Firstly, their compositional structure may allow the computation of other quantities of interest, and may facilitate inference from the model to the modelled phenomenon by matching its structure. Secondly, they allow for diagrammatic explanations for their behaviour, based on influence constraints, diagram surgery and rewrite explanations. Finally, we discuss many future directions for the approach, raising the question of how to learn such meaningfully structured models in practice.

Authors (5)

Sean Tull (21 papers)
Robin Lorenz (8 papers)
Stephen Clark (38 papers)
Ilyas Khan (11 papers)
Bob Coecke (107 papers)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a novel categorical framework using string diagrams to clarify AI model interpretability.
It formalizes compositional-interpretable models, bridging intrinsically interpretable and causal models for deeper insights.
It provides diagrammatic explanation techniques that yield actionable transparency across various AI systems, including quantum models.

A Categorical Theory of Explainability in AI

The paper explores the increasingly critical issue of explainability in AI through a novel approach grounded in category theory. This work addresses the interpretability problem in ML models, which has become a significant concern in sensitive domains such as finance, law, and health. The authors introduce a theoretical framework that leverages the mathematical language of category theory to analyze AI models and their interpretability through compositional constructs.

The problem of explainability stems primarily from the black-box nature of many modern ML models, such as neural networks (NNs). These models are highly effective in practice but lack transparency in how they achieve their results. The emergent field of eXplainable AI (XAI) has developed numerous post-hoc techniques to elucidate the behavior of these models. However, as highlighted in this paper, such methods often produce approximate and somewhat limited explanations. There is a growing argument for the use of intrinsically interpretable models, which inherently possess a structure understandable to humans without requiring post-hoc methods.

Category Theory as a Framework

Category theory, with its emphasis on processes and their composition, is proposed as a powerful tool for defining and analyzing AI models. The authors present AI models through "string diagrams," a graphical calculus affiliated with category theory. This approach applies to various AI models, including linear models, NNs, transformers, and causal models. The string diagrams generalize existing graphical approaches such as decision trees, computational graphs for NNs, and DAGs for causal models, enabling formal reasoning about the models themselves.

Key Contributions

The central contributions of the paper are multifold:

Presentation of Compositional Models: The authors offer a formal basis to compare different AI models and paper their interpretability using compositional methods.
Clarification of XAI Concepts: The paper refines several concepts within XAI, including the distinction between models that are interpretable versus those that afford approximate explanations, notions of intrinsic interpretability, and the relation between model structure and real-world phenomena.
Introduction of CI Models: They define "compositional-interpretable" (CI) models, generalizing the concept of intrinsically interpretable models to include those with rich, interpretable compositional structures, such as causal models and DisCoCirc models in NLP.
Diagrammatic Explanations: Illustration of how CI models allow for different forms of explanations, including no-influence arguments, diagram surgery, and rewrite explanations, which provide guarantees and explanations based on diagrammatic equations.

Practical and Theoretical Implications

The implications of this research are vast. Practically, the categorical framework provides a unified language to describe and compare various AI models, which can potentially ease the burden of interpreting complex systems in high-stakes applications. Theoretically, this work encourages the development of new, intrinsically interpretable models grounded in the rich structure of category theory.

The compositional perspective could significantly impact future AI research by promoting the design of models that are not only effective but also comprehensible. Additionally, the categorical treatment accommodates both classical and quantum models, suggesting that this framework could be essential for interpreting future quantum AI models. The categorical approach retains its robustness even when applied to the abstract structures of quantum processes, where interpretability remains a significant challenge.

Future Directions

Future research could explore the space of compositional models, exploring how compositional structure can be learned from data. This work opens the door to new methods for training models that inherently possess interpretable structures. Furthermore, the exploration of more complex forms of compositionality in AI models is a promising path, potentially enhancing the interpretability and transparency of next-generation AI systems.

In conclusion, this paper contributes significantly to the ongoing discourse on AI interpretability by proposing a categorical framework for model description and analysis. Its approach offers a promising avenue both for practitioners aiming to build interpretable AI systems and for researchers seeking to understand the theoretical underpinnings of explainability in AI.

PDF Markdown

Related Papers

Tweets

https://twitter.com/coecke/status/1805942638862082243

https://twitter.com/bmorphism/status/1805946022872715616

https://twitter.com/QuantinuumQC/status/1805861055949615395

https://twitter.com/apurv_saha/status/1807639283890901200

https://twitter.com/mattmcd/status/1875260447777296529

https://twitter.com/bmorphism/status/1815626149407367328