Making deep neural networks right for the right scientific reasons by interacting with their explanations (2001.05371v4)

Published 15 Jan 2020 in cs.LG, cs.AI, and stat.ML

Abstract: Deep neural networks have shown excellent performances in many real-world applications. Unfortunately, they may show "Clever Hans"-like behavior -- making use of confounding factors within datasets -- to achieve high performance. In this work, we introduce the novel learning setting of "explanatory interactive learning" (XIL) and illustrate its benefits on a plant phenotyping research task. XIL adds the scientist into the training loop such that she interactively revises the original model via providing feedback on its explanations. Our experimental results demonstrate that XIL can help avoiding Clever Hans moments in machine learning and encourages (or discourages, if appropriate) trust into the underlying model.

Citations (198)

View on Semantic Scholar

Summary

The paper introduces Explanatory Interactive Learning (XIL), a framework using human feedback on model explanations to ensure deep neural networks learn from relevant features, not confounding factors.
XIL implements human interaction through dataset augmentation with counterexamples (CE) and a "right for the right reasons" (RRR) loss function to guide model learning toward domain-appropriate logic.
Empirical results show XIL-trained models demonstrate improved trustworthiness by focusing on pertinent features and avoiding spurious correlations on benchmark and real-world plant phenotyping datasets.

Explanatory Interactive Learning to Enhance Scientific Trust in Machine Learning Models

The paper "Making deep neural networks right for the right scientific reasons by interacting with their explanations" addresses an essential issue prevalent in the field of machine learning: the so-called "Clever Hans" behavior where models may achieve high accuracy by exploiting confounding factors rather than learning the relevant features of data. The authors introduce a novel learning framework called Explanatory Interactive Learning (XIL), which incorporates human feedback into the model training process to mitigate this issue, particularly in the context of plant phenotyping.

Deep neural networks (DNNs) have demonstrated remarkable performance across diverse domains. However, their opaqueness often leads to skepticism among domain experts due to results that may seem implausibly accurate, as evidenced by the authors through an example in plant phenotyping. The work highlights how DNNs might leverage dataset artifacts, subsequently eroding trust when the model's predictions do not align with domain knowledge.

Contributions and Methodological Developments

The key contribution of this paper is the introduction of XIL, a paradigm that allows human experts to interact with model explanations during training, thereby ensuring the model's decision-making aligns with domain-appropriate logic. This interaction is implemented through two methods: augmenting datasets with counterexamples (CE) and applying a "right for the right reasons" (RRR) loss to restrict the model’s decision strategies to biologically plausible paths.

Counterexamples (CE): By identifying and modifying parts of the dataset that the model incorrectly uses for prediction, CE allows the dataset to be amended with examples that counter the model’s faulty logic. This method is model-agnostic, broadening its applicability.
Right for the Right Reasons (RRR) Loss: This regularizes the model such that only relevant features identified by experts influence the prediction, thus fostering trust in machine learning models by encouraging decisions that align with domain-specific expectations.

Empirical Evaluation and Results

The research empirically validates the XIL framework on both benchmark datasets (like PASCAL VOC 2007) and a real-world plant phenotyping dataset. The latter consists of hyperspectral and RGB images of sugar beet leaves categorized as healthy or diseased. Models trained without expert interaction often focused on unrelated confounding factors, such as agar used in image backgrounds, to differentiate class labels. However, with the XIL approach, models learned to disregard such irrelevant features.

The paper reports substantial improvements in model trustworthiness when evaluated by machine learning professionals and domain experts alike. XIL-trained models exhibited decision strategies focusing on pertinent features rather than spurious correlations, ultimately enhancing judgment reliability when assessed on non-confounded test datasets.

Implications and Future Directions

The implications of this paper are significant, especially in high-stakes environments dependent on reliable machine learning models for decision-making. By involving domain experts in an interactive learning loop, the XIL framework promises a means of improving model transparency and efficacy in complex real-world scenarios.

Future research may focus on refining the interaction mechanisms used in XIL, optimizing query strategies to minimize the need for expert intervention, and extending XIL beyond image classification to other data forms such as text and multi-modal data. Another notable area is ensuring that the feedback from domain experts remains productive and does not inadvertently introduce new biases. Improvements in explanation generation, potentially making use of learned insights to guide the interaction further, are also critical areas for advancement.

In conclusion, by addressing the critical issue of trust in autonomous decision-making systems, this work provides a compelling direction for enhancing the interpretability of machine learning systems. This ensures that models not only predict accurately but also do so for reasons aligning with human expertise and understanding.

Related Papers

Preemptively Pruning Clever-Hans Strategies in Deep Neural Networks (2023)
Impact of Feedback Type on Explanatory Interactive Learning (2022)
Learning by Self-Explaining (2023)
One Explanation Does Not Fit XIL (2023)
Causal Explanations of Structural Causal Models (2021)