DiConStruct: Causal Concept-based Explanations through Black-Box Distillation (2401.08534v4)

Published 16 Jan 2024 in cs.LG, cs.AI, and cs.HC

Abstract: Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the performance of the predictive task. Despite the rapid advances in AI explainability in recent years, as far as we know to date, no method fulfills these three properties. Indeed, mainstream methods for local concept explainability do not produce causal explanations and incur a trade-off between explainability and prediction performance. We present DiConStruct, an explanation method that is both concept-based and causal, with the goal of creating more interpretable local explanations in the form of structural causal models and concept attributions. Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations. Because of this, DiConStruct generates explanations efficiently while not impacting the black-box prediction task. We validate our method on an image dataset and a tabular dataset, showing that DiConStruct approximates the black-box models with higher fidelity than other concept explainability baselines, while providing explanations that include the causal relations between the concepts.

PDF Abstract

Introduction to DiConStruct

The field of explainable AI (XAI) attempts to elucidate the machinations of complex ML systems by providing human-interpretable explanations for model predictions. A vital objective in this pursuit is to provide explanations that are both concept-based, hence semantically meaningful to humans, and causal, enabling rational discourse about the relationships between concepts.

Methodology of DiConStruct

The paper under discussion introduces DiConStruct, an explanation method that stands out by being concept-based and causal. It defines local explanations via structural causal models (SCM) and leverages concept attributions, effectively applying distillation techniques to approximate black-box ML models without degrading predictive performance. This method ensures explanatory efficiency by avoiding any impact on the predictive task. DiConStruct operates by constructing a surrogate model that is trained on a dataset labeled with human-defined concepts, which in turn requires a directed acyclic graph (DAG) reflecting the presumed causal relations among concepts and model outputs. The DAG can be obtained either through expert knowledge or causal discovery algorithms.

Performance and Validation

Validated using an image dataset and a tabular dataset, DiConStruct shows commendable fidelity in approximating the black-box models, surpassing several baseline models of concept explainability. It achieves this without the frequently witnessed trade-off between explainability and predictive performance. Notably, the explainer displays robust concept attribution diversity, crucial for ensuring the local relevance of its explanations.

Significance and Future Work

DiConStruct pushes the boundaries of XAI by introducing causality into concept-based explanations. The significance of marrying these properties lies in creating explanations that go beyond mere associations in the data to uncovering potential causal pathways. This leap is vital for contexts where understanding the causal underpinnings of model decisions is paramount, such as in regulated industries.

Future enhancements could involve extending to multi-class concepts, integrating latent variables for more comprehensive coverage, evaluating concept leakage, and contrasting DiConStruct's causal explanations with other explanatory frameworks. Although the explainer's performance is contingent on the completeness and relevance of the concept set, DiConStruct represents a significant stride in the field of XAI, advocating a method that is interpretable, causally informed, and operational at the local explanation level.