Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness? (2004.03685v3)

Published 7 Apr 2020 in cs.CL and cs.LG

Abstract: With the growing popularity of deep-learning based NLP models, comes a need for interpretable systems. But what is interpretability, and what constitutes a high-quality interpretation? In this opinion piece we reflect on the current state of interpretability evaluation research. We call for more clearly differentiating between different desired criteria an interpretation should satisfy, and focus on the faithfulness criteria. We survey the literature with respect to faithfulness evaluation, and arrange the current approaches around three assumptions, providing an explicit form to how faithfulness is "defined" by the community. We provide concrete guidelines on how evaluation of interpretation methods should and should not be conducted. Finally, we claim that the current binary definition for faithfulness sets a potentially unrealistic bar for being considered faithful. We call for discarding the binary notion of faithfulness in favor of a more graded one, which we believe will be of greater practical utility.

Citations (517)

View on Semantic Scholar

Summary

The paper introduces a survey that categorizes existing approaches to evaluating faithfulness in NLP models.
It critiques the binary notion of faithfulness and advocates for a graded framework that better captures model reasoning.
The study emphasizes the need for reliable interpretability in critical applications such as health and law.

Towards Faithfully Interpretable NLP Systems: Defining and Evaluating Faithfulness

The paper, "Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?" by Jacovi and Goldberg offers a critical examination of interpretability within NLP models, focusing specifically on the concept of faithfulness. As deep-learning models in NLP continue to expand in applicability, interpretability becomes crucial, particularly in domains such as health and law. Nevertheless, defining and evaluating the quality of interpretations remains a challenging and often inconsistent endeavor.

Key Themes and Contributions

The authors provide a structured critique of current practices in interpreting NLP models, arguing that there is a conflation between relevant criteria such as readability, plausibility, and faithfulness. They advocate for a more explicit differentiation of these criteria. This paper emphasizes particularly on faithfulness, defined as the accuracy with which an interpretation reflects the model's reasoning process.

The main contributions of the paper revolve around:

Survey and Categorization of Existing Approaches: The authors organize current faithfulness evaluation methods around three core assumptions: the model assumption, prediction assumption, and linearity assumption, thus standardizing the discussion and analysis of faithfulness.
Critique of Binary Faithfulness Evaluation: The paper argues against treating faithfulness as a binary property. Highlighting the complexity and non-deterministic nature of deep-learning models, the authors suggest that such a strict viewpoint is not only impractical but also limiting.
Call for Graded Criteria: Proposing a shift towards a graded evaluation framework, the authors believe this approach will provide more practical utility. Such a framework would allow researchers to assess faithfulness comparatively across different models, tasks, and even within subspaces of the same model.

Implications and Future Directions

The implications of this research are both practical and theoretical. Practically, it calls for a robust framework to distinguish between plausibility and faithfulness, which is crucial for developing explanations that can be trusted in sensitive applications. Misinterpretations could have severe repercussions, particularly when models inform decision-making with real-world consequences.

Theoretically, the work challenges the broader AI research community to refine the conceptual foundations of interpretability. By moving away from a binary assessment to a more granular understanding of faithfulness, there is the potential to develop more sophisticated evaluation strategies and improve the reliability of NLP model interpretations.

In the future, it's anticipated that research will need to devise concrete methodologies for measuring graded faithfulness. These methodologies should consider both model-specific and instance-specific attributes, fostering a nuanced understanding that can guide the development of more reliable interpretative models.

Conclusion

Jacovi and Goldberg’s paper underscores the pressing need for refinement and clarity in the evaluation of interpretability within NLP systems. By focusing on faithfulness, the authors delineate a path towards more meaningful interpretations of NLP models. Their work highlights the necessity for progressive frameworks that embrace the complexity of model interpretability, ultimately advancing the research community's ability to create models that are not just interpretable, but truly trustworthy.

PDF Markdown