Interpretable Neural Predictions with Differentiable Binary Variables (1905.08160v2)

Published 20 May 2019 in cs.CL

Abstract: The success of neural networks comes hand in hand with a desire for more interpretability. We focus on text classifiers and make them more interpretable by having them provide a justification, a rationale, for their predictions. We approach this problem by jointly training two neural network models: a latent model that selects a rationale (i.e. a short and informative part of the input text), and a classifier that learns from the words in the rationale alone. Previous work proposed to assign binary latent masks to input positions and to promote short selections via sparsity-inducing penalties such as L0 regularisation. We propose a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE. In our formulation, we can tractably compute the expected value of penalties such as L0, which allows us to directly optimise the model towards a pre-specified text selection rate. We show that our approach is competitive with previous work on rationale extraction, and explore further uses in attention mechanisms.

Authors (3)

Jasmijn Bastings (19 papers)
Wilker Aziz (32 papers)
Ivan Titov (108 papers)

Citations (199)

View on Semantic Scholar

Summary

Interpretable Neural Predictions with Differentiable Binary Variables

The paper "Interpretable Neural Predictions with Differentiable Binary Variables" addresses the challenge of improving interpretability in neural network-based text classifiers by providing rationales for their predictions. The authors develop a framework that adopts a unique approach to rationale extraction by leveraging a latent variable model that allows differentiable binary variables, enabling gradient-based training without relying on the REINFORCE algorithm.

Overview

The paper introduces a novel method that makes use of two interconnected neural networks: a rationale extractor that selects a subset of input text as the rationale and a text classifier that bases its predictions solely on the selected rationale. The primary aim is to generate concise, yet sufficient rationales that can inform the classification decision transparently.

A significant aspect of this research is the proposal of a new distribution, termed "HardKuma," which enables the combination of discrete and continuous behavioral modeling of the latent variables. This sophisticated approach facilitates efficient and tractable computation of expectations of sparsity-inducing penalties, such as the $L_0$ regularization. Consequently, the text selection process can be directly optimized towards ensuring a pre-specified selection rate.

Contributions

The authors contribute substantially to the field through the following innovations:

Differentiable Rationale Extraction: The framework integrates a tractable objective that specifies how much text should be extracted by using differentiable binary variables, promoting the extraction of essential text components without the noise of unnecessary words.
HardKuma Distribution: The development and implementation of the HardKuma distribution support binary outcomes, facilitating reparameterized gradient estimates. This distribution expands the capability of neural models to handle discrete selections in a differentiable manner.
Application to Attention Mechanisms: The paper suggests that the HardKuma distribution’s potential use goes beyond rationale extraction, presenting its applicability in attention mechanisms.

Empirical Results

The authors provide empirical evidence demonstrating that their approach is competitive regarding rationale extraction when compared to previous work in the domain. The experiments conducted include multi-aspect sentiment analysis and natural language inference tasks. Significant findings are:

In sentiment analysis, the proposed model achieved high precision in rationale selection, producing effective classification results using only a fraction of the text.
The model demonstrated resilience in natural language inference settings by maintaining competitive performance even when a restricted portion of the input text was utilized for reasoning and decision-making.

Implications and Future Directions

The integration of differentiable rationale extraction can significantly affect AI interpretability, potentially leading to more robust, trustworthy AI models. The ability to offer insight into classifiers' decision-making processes is crucial for applications demanding transparency, such as in legal or clinical settings.

Given the success of the HardKuma framework, future research could explore its application to larger and more complex datasets, particularly those requiring nuanced interpretability. Additionally, further refinement of the HardKuma distribution parameters could yield even more refined control over rationale compactness and quality. Investigating its integration with other model architectures and expanding its use in various NLP problems could also be beneficial.

In conclusion, this paper provides a methodical and technically sound solution to the problem of transparency in neural network models, advancing the field by offering tools to extract interpretable and informative rationales effectively.

PDF Markdown