Interpretation of NLP models through input marginalization (2010.13984v1)

Published 27 Oct 2020 in cs.CL and cs.AI

Abstract: To demystify the "black box" property of deep neural networks for NLP, several methods have been proposed to interpret their predictions by measuring the change in prediction probability after erasing each token of an input. Since existing methods replace each token with a predefined value (i.e., zero), the resulting sentence lies out of the training data distribution, yielding misleading interpretations. In this study, we raise the out-of-distribution problem induced by the existing interpretation methods and present a remedy; we propose to marginalize each token out. We interpret various NLP models trained for sentiment analysis and natural language inference using the proposed method.

Citations (56)

View on Semantic Scholar

Summary

The paper proposes a novel input marginalization method utilizing BERT's masked language modeling to address the out-of-distribution (OOD) problem in interpreting NLP models.
This method accounts for plausible candidate tokens via BERT's MLM, demonstrating improved interpretation accuracy validated by quantitative measures like AUC_rep.
The approach enhances the transparency and trustworthiness of complex NLP models by providing a more reliable framework for understanding their decision-making processes.

Interpretation of NLP Models Through Input Marginalization

The paper "Interpretation of NLP Models Through Input Marginalization," presented by Siwon Kim et al., addresses a notable challenge in the interpretability of deep neural networks (DNNs) applied to NLP. As NLP models become increasingly complex and integral to high-stakes applications, understanding their decision-making processes is crucial. The authors propose an innovative approach to mitigate the out-of-distribution (OOD) problem inherent in existing interpretation methods, which involves marginalization of input tokens rather than their replacement with predefined values.

Overview and Methodology

The primary objective of the paper is to enhance the interpretability of NLP models, specifically those trained for sentiment analysis and natural language inference tasks, by introducing a novel method of input marginalization. Traditional interpretation techniques often involve analyzing how a model's prediction changes when each token in a sentence is removed or replaced. However, replacing tokens with a fixed value such as zero can lead to sentences that deviate from the training distribution, causing misleading interpretations due to OOD issues.

To address this, the authors propose a method where each token in the input is marginalized out to account for the likelihoods of plausible candidate tokens. This is achieved by leveraging the masked LLMing (MLM) capabilities of BERT (Bidirectional Encoder Representations from Transformers) to model token likelihoods accurately. By applying input marginalization, the method captures the contribution of each token to the model's prediction in a more faithful manner, avoiding the distortion caused by OOD tokens.

Key Contributions

Identification of OOD Problem: The paper highlights the OOD confusion in token erasure schemes for interpreting NLP models, an issue not previously addressed in NLP.
Novel Interpretation Method: It introduces input marginalization using BERT's MLM to overcome the OOD issue, presenting a more reliable mechanism to measure token contribution.
Quantitative Analysis: The authors apply their method to various NLP models, verifying the interpretation accuracy through quantitative measures such as $\text{AUC}_\text{rep}$ , which evaluates how quickly the prediction probability declines when tokens are substituted with MLM-sampled candidates.

Implications and Future Directions

The proposed input marginalization method provides a robust framework for interpreting complex NLP models. Practically, this enhances the transparency and trustworthiness of models used in sensitive applications, such as healthcare or legal systems, by elucidating the rationale behind their predictions. Theoretical implications involve advancing the discussion on effective model interpretation techniques by emphasizing the importance of aligning interpretations with training data distributions.

Future research may focus on refining the token likelihood modeling further, potentially incorporating advanced LLMs or custom-tuned models to address specific task contexts. Additionally, extending this approach to other NLP tasks or integrating it with state-of-the-art models like XLNet or ELECTRA could offer broader applicability.

Conclusion

The paper successfully tackles a critical aspect of machine learning interpretability by proposing and validating a method to alleviate the OOD complications in interpreting NLP model predictions. As machine learning models continue to play pivotal roles in decision-making systems, the insights and methodologies contributed by this research serve as foundational for developing interpretable, reliable AI applications.

Related Papers

YouTube

Show All Videos