Learning to Faithfully Rationalize by Construction (2005.00115v1)

Published 30 Apr 2020 in cs.CL, cs.AI, and cs.LG

Abstract: In many settings it is important for one to be able to understand why a model made a particular prediction. In NLP this often entails extracting snippets of an input text responsible for' corresponding model output; when such a snippet comprises tokens that indeed informed the model's prediction, it is a faithful explanation. In some settings, faithfulness may be critical to ensure transparency. Lei et al. (2016) proposed a model to produce faithful rationales for neural text classification by defining independent snippet extraction and prediction modules. However, the discrete selection over input tokens performed by this method complicates training, leading to high variance and requiring careful hyperparameter tuning. We propose a simpler variant of this approach that provides faithful explanations by construction. In our scheme, named FRESH, arbitrary feature importance scores (e.g., gradients from a trained model) are used to induce binary labels over token inputs, which an extractor can be trained to predict. An independent classifier module is then trained exclusively on snippets provided by the extractor; these snippets thus constitute faithful explanations, even if the classifier is arbitrarily complex. In both automatic and manual evaluations we find that variants of this simple framework yield predictive performance superior toend-to-end' approaches, while being more general and easier to train. Code is available at https://github.com/successar/FRESH

PDF Abstract

An Overview of "Learning to Faithfully Rationalize by Construction"

The paper "Learning to Faithfully Rationalize by Construction" addresses a pivotal concern in NLP: the need for model transparency and understanding the rationales behind predictions made by neural models. This concern arises especially in the field of deep learning, where complex models often act as 'black boxes,' making it difficult to ascertain which parts of the input data significantly influence the output predictions.

Motivation and Background

Understanding the decision-making process of neural models is crucial for numerous applications, particularly those where the implications of model decisions carry weighty consequences, such as in healthcare or finance. Faithful explanations refer to the ability to identify which input features were genuinely instrumental in the model's prediction. Previous work in this area, notably by Lei et al., attempted to extract such rationales using joint models that combined extraction and prediction components. However, these methods often faced challenges due to the discrete nature of token selection, which complicated training and resulted in high variance, demanding intricate hyperparameter tuning.

Proposed Framework: FRESH

The authors introduce FRESH (Faithful Rationale Extraction from Saliency tHresholding), which presents a structured approach to circumvent the limitations of earlier models. FRESH decouples rationale extraction from prediction, thereby simplifying training and ensuring that the constructed rationales are inherently faithful. The process entails the following steps:

Feature Scoring: An arbitrary mechanism generates feature importance scores. This could rely on gradients, attention weights, or other signaling from an underlying model.
Discretization and Extraction: These scores are converted into binary labels to identify rationales. The extraction can be done using a straightforward heuristic or a learned module.
Independent Classifier Training: The final step involves training a classifier exclusively on the extracted rationales. This ensures the classifier's predictions only stem from the extracted, arguably faithful, subset of the input text.

The authors test this framework across various datasets and demonstrate that FRESH offers competitive predictive accuracy compared to traditional end-to-end methods, while being less complex to train and providing arguably more transparent rationales.

Evaluation and Results

The performance of the FRESH framework was evaluated against several benchmark datasets, including the Stanford Sentiment Treebank (SST), AG News, and MultiRC, among others. The results indicate that FRESH can achieve a predictive performance that is on par with, or in some cases better than, more integrated predictive models. Notably, the framework is shown to avoid the pitfalls of high variance and sensitivity to hyperparameter tuning that afflict approaches relying heavily on reinforcement learning techniques.

Practical and Theoretical Implications

Practically, FRESH offers a scalable and versatile framework for rendering neural model predictions interpretable without demanding significant compromises on accuracy. The decoupled training permits the use of complex models for prediction, which can be trained solely based on the rationale snippets. From a theoretical standpoint, FRESH sets a precedent for the design of model architectures where components are differentiated yet synergistic, enhancing both the transparency and understandability of model decisions.

Future Directions

This work opens several avenues for future research. The implications of using different feature scoring methods, the impact of incorporating human rationales as additional supervision, and further exploration into striking a balance between rationale conciseness and comprehensiveness are promising directions. Additionally, extending the framework to more challenging datasets or tasks could further validate and refine the model's generalizability and robustness.

In summary, "Learning to Faithfully Rationalize by Construction" presents a significant stride towards enhancing the interpretability of neural models in NLP, leveraging a methodologically straightforward yet effective framework for rationale extraction and prediction.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Sarthak Jain (33 papers)
Sarah Wiegreffe (20 papers)
Yuval Pinter (41 papers)
Byron C. Wallace (82 papers)

Citations (147)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - successar/FRESH (27 stars)