Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations (1905.07697v2)

Published 19 May 2019 in cs.LG, cs.CY, and stat.ML

Abstract: Post-hoc explanations of machine learning models are crucial for people to understand and act on algorithmic predictions. An intriguing class of explanations is through counterfactuals, hypothetical examples that show people how to obtain a different prediction. We posit that effective counterfactual explanations should satisfy two properties: feasibility of the counterfactual actions given user context and constraints, and diversity among the counterfactuals presented. To this end, we propose a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes. To evaluate the actionability of counterfactuals, we provide metrics that enable comparison of counterfactual-based methods to other local explanation methods. We further address necessary tradeoffs and point to causal implications in optimizing for counterfactuals. Our experiments on four real-world datasets show that our framework can generate a set of counterfactuals that are diverse and well approximate local decision boundaries, outperforming prior approaches to generating diverse counterfactuals. We provide an implementation of the framework at https://github.com/microsoft/DiCE.

PDF Abstract

Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations

The paper, authored by Mothilal, Sharma, and Tan, presents a novel approach for providing post-hoc explanations of ML classifiers via diverse counterfactual examples. This work addresses key limitations in existing explanation methodologies, focusing on generating counterfactuals that are not only feasible but also diverse, thus enhancing their actionability and interpretability.

Core Thesis and Methodology

The primary objective of this paper is to generate counterfactual explanations that support users in understanding and acting upon algorithmic decisions. Traditional counterfactual methods often fail to consider the feasibility and diversity of the counterfactual actions, particularly in specific user contexts where only actionable changes are useful.

To this end, the authors propose a framework leveraging determinantal point processes (DPPs) for generating a set of diverse counterfactual examples. The optimization framework aims to balance three critical aspects:

Validity: Ensuring that the counterfactual examples indeed result in a different outcome.
Proximity: Ensuring that the changes suggested are minimal, making them plausible for users.
Diversity: Ensuring that the counterfactual set covers a broad range of actionable changes.

Formulation and Optimization

The paper introduces an optimization problem for generating k counterfactual examples. The loss function consists of three terms:

Prediction Difference Term: Ensures that the output for the counterfactual examples differs from the original prediction.
Proximity Term: Penalizes large deviations from the original input to ensure feasibility.
Diversity Term: Uses DPPs to ensure that the set of counterfactuals is diverse.

The optimization employs gradient descent methods, and the resulting counterfactuals are filtered using post-hoc causal constraints to enforce real-world feasibility.

Empirical Evaluation and Results

The proposed method, termed DiverseCF, is empirically evaluated on four datasets: COMPAS, Adult-Income, German-Credit, and LendingClub. For each dataset, the performance of DiverseCF is compared against several baselines, including methods that generate single counterfactuals (SingleCF), methods based on random initialization (RandomInitCF), and prior work utilizing integer programming (MixedIntegerCF).

Key results include:

Validity: DiverseCF consistently produces higher valid counterfactuals across all datasets, maintaining near 100% validity even for larger sets of counterfactuals.
Diversity: DiverseCF significantly outperforms baseline methods in generating diverse counterfactuals, both in continuous and categorical feature spaces. The method proves particularly effective as the desired number of counterfactuals (k) increases.
Proximity: Although optimizing for diversity can reduce proximity, the DiverseCF method still generates sufficiently proximal counterfactuals, ensuring that changes are actionable.

Additionally, DiverseCF is demonstrated to approximate the local decision boundary of ML models effectively, often outperforming established local explanation methods such as LIME when evaluated using a 1-nearest neighbor classifier on simulated test data.

Theoretical and Practical Implications

The theoretical contributions of this paper are manifold. By focusing on the inherent trade-offs between diversity and proximity, the authors provide deeper insights into optimizing counterfactual explanations. Moreover, the paper's empirical validation on multiple datasets furnishes strong evidence of the practical utility of diverse and feasible counterfactuals.

Practical implications are extensive, spanning from enhanced user interpretability to model debugging and fairness evaluations. For instance, by exposing biases (e.g., racial biases in COMPAS predictions), DiverseCF can serve as a powerful tool for model developers and auditors to identify and rectify prejudicial model behavior. Additionally, the ability to provide diverse actionable changes can significantly empower users, particularly in high-stakes domains like finance and healthcare.

Future Directions

The paper outlines several promising avenues for future research. Enhancements to handle fully black-box models without gradient information, incorporation of causal constraints directly into the counterfactual generation process, and user-centric interface designs for personalized feature scaling and constraint inputs represent critical areas of exploration. Furthermore, empirical validation through user studies would substantiate the practical benefits of diverse counterfactuals, potentially leading to more sophisticated and user-friendly ML explanation frameworks.

In conclusion, this paper makes substantial advancements in the field of ML interpretability by introducing a robust framework for generating diverse and actionable counterfactual explanations. The blend of theoretical rigor and practical validation marks a significant step forward in making ML systems more transparent, fair, and user-interpretable.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Ramaravind Kommiya Mothilal (8 papers)
Amit Sharma (88 papers)
Chenhao Tan (89 papers)

Citations (897)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - interpretml/DiCE: Generate Diverse Counterfactual Explanations for any machine learning model. (1,293 stars)