Faithfulness Measurable Masked Language Models

Published 11 Oct 2023 in cs.CL and cs.LG | (2310.07819v3)

Abstract: A common approach to explaining NLP models is to use importance measures that express which tokens are important for a prediction. Unfortunately, such explanations are often wrong despite being persuasive. Therefore, it is essential to measure their faithfulness. One such metric is if tokens are truly important, then masking them should result in worse model performance. However, token masking introduces out-of-distribution issues, and existing solutions that address this are computationally expensive and employ proxy models. Furthermore, other metrics are very limited in scope. This work proposes an inherently faithfulness measurable model that addresses these challenges. This is achieved using a novel fine-tuning method that incorporates masking, such that masking tokens become in-distribution by design. This differs from existing approaches, which are completely model-agnostic but are inapplicable in practice. We demonstrate the generality of our approach by applying it to 16 different datasets and validate it using statistical in-distribution tests. The faithfulness is then measured with 9 different importance measures. Because masking is in-distribution, importance measures that themselves use masking become consistently more faithful. Additionally, because the model makes faithfulness cheap to measure, we can optimize explanations towards maximal faithfulness; thus, our model becomes indirectly inherently explainable.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (3)

View on Semantic Scholar

Summary

The paper presents a novel approach that integrates faithfulness measurement directly into MLM architecture via masked fine-tuning, eliminating costly retraining.
It employs in-distribution validation with statistical tests from computer vision to ensure that token masking accurately reflects importance.
Experimental results on 16 datasets demonstrate that occlusion-based measures like Leave-One-Out yield more faithful explanations than gradient-based methods.

Faithfulness Measurable Masked LLMs: A Technical Overview

The paper "Faithfulness Measurable Masked LLMs" investigates a critical aspect of NLP: the faithfulness of explanations provided by importance measures in masked LLMs (MLMs). These importance measures are essential for understanding which input tokens influence the model's predictions. Although appealing, these measures often lack reliability, leading to explanations that can appear persuasive yet fail to accurately reflect the model's reasoning.

Motivations and Existing Challenges

One prominent approach to gauge the faithfulness of importance measures is the use of erasure metrics, where removing supposedly important tokens should degrade model performance. However, this approach faces significant challenges. Masking tokens can lead to out-of-distribution (OOD) inputs, skewing the model's predictive behavior. Existing solutions involve retraining with masked inputs—a method that is computationally expensive and often fails to reflect the distribution of the original data.

Proposed Solution: Faithfulness Measurable Models

This paper introduces the concept of inherently faithfulness measurable models (FMMs), which integrate faithfulness testing into the model's architecture by design. Unlike model-agnostic methods, FMMs are fine-tuned to accommodate masked inputs inherently within their distribution, thereby ensuring that the subsequent analysis of their explanations remains valid and reliable.

Methodological Approach

Masked Fine-Tuning: The FMM employs a novel fine-tuning method that involves training the models with inputs where a random percentage of tokens are masked. This approach ensures that masking becomes a part of the model's learned distribution, thereby supporting accurate faithfulness metrics without requiring retraining.
In-Distribution Validation: The paper adapts statistical tests from computer vision to validate that masked inputs remain in-distribution. This step is crucial for ensuring that any observed degradation in model performance due to token masking genuinely reflects an importance measure's faithfulness.
Faithfulness Evaluation: The researchers employ a repeated masking protocol to evaluate various importance measures. This protocol iteratively masks tokens deemed important, measuring and comparing the model's classification performance against a baseline where random tokens are masked.

Experimental Insights

The paper applies this framework to sixteen datasets, utilizing both small and large versions of RoBERTa. The results are compelling and demonstrate that models fine-tuned with the proposed methodology maintain classification performance on unmasked inputs and perform robustly on masked inputs. Importantly, the findings underscore that occlusion-based importance measures, such as Leave-One-Out (LOO), tend to yield more faithful explanations compared to gradient-based methods within the FMM framework.

Implications and Future Directions

The introduction of FMMs holds significant implications for the deployment of NLP models in real-world applications where interpretability and reliability are paramount. By integrating faithfulness assessment directly into the model's architecture, the need for computationally intensive retraining with surrogate models is eliminated, enhancing the practicality and scalability of NLP interpretability research.

Theoretically, this approach offers new insights into the adaptability of over-parameterized models like MLMs to incorporate complex learning tasks like distributional transformations inherent in token masking. On a practical level, it allows practitioners to confidently rely on importance measures for model auditing and debugging, augmenting broader efforts toward developing trustworthy AI.

This research paves the way for extending the concept of faithfulness measurable models to different architectures beyond MLMs. Future work could explore its adaptation to other tasks, such as generation tasks in LLMs, and evaluate its utility in legal and ethical contexts where explanation transparency is crucial.

Markdown Report Issue