- The paper presents a novel approach that integrates faithfulness measurement directly into MLM architecture via masked fine-tuning, eliminating costly retraining.
- It employs in-distribution validation with statistical tests from computer vision to ensure that token masking accurately reflects importance.
- Experimental results on 16 datasets demonstrate that occlusion-based measures like Leave-One-Out yield more faithful explanations than gradient-based methods.
Faithfulness Measurable Masked LLMs: A Technical Overview
The paper "Faithfulness Measurable Masked LLMs" investigates a critical aspect of NLP: the faithfulness of explanations provided by importance measures in masked LLMs (MLMs). These importance measures are essential for understanding which input tokens influence the model's predictions. Although appealing, these measures often lack reliability, leading to explanations that can appear persuasive yet fail to accurately reflect the model's reasoning.
Motivations and Existing Challenges
One prominent approach to gauge the faithfulness of importance measures is the use of erasure metrics, where removing supposedly important tokens should degrade model performance. However, this approach faces significant challenges. Masking tokens can lead to out-of-distribution (OOD) inputs, skewing the model's predictive behavior. Existing solutions involve retraining with masked inputs—a method that is computationally expensive and often fails to reflect the distribution of the original data.
Proposed Solution: Faithfulness Measurable Models
This paper introduces the concept of inherently faithfulness measurable models (FMMs), which integrate faithfulness testing into the model's architecture by design. Unlike model-agnostic methods, FMMs are fine-tuned to accommodate masked inputs inherently within their distribution, thereby ensuring that the subsequent analysis of their explanations remains valid and reliable.
Methodological Approach
- Masked Fine-Tuning: The FMM employs a novel fine-tuning method that involves training the models with inputs where a random percentage of tokens are masked. This approach ensures that masking becomes a part of the model's learned distribution, thereby supporting accurate faithfulness metrics without requiring retraining.
- In-Distribution Validation: The paper adapts statistical tests from computer vision to validate that masked inputs remain in-distribution. This step is crucial for ensuring that any observed degradation in model performance due to token masking genuinely reflects an importance measure's faithfulness.
- Faithfulness Evaluation: The researchers employ a repeated masking protocol to evaluate various importance measures. This protocol iteratively masks tokens deemed important, measuring and comparing the model's classification performance against a baseline where random tokens are masked.
Experimental Insights
The paper applies this framework to sixteen datasets, utilizing both small and large versions of RoBERTa. The results are compelling and demonstrate that models fine-tuned with the proposed methodology maintain classification performance on unmasked inputs and perform robustly on masked inputs. Importantly, the findings underscore that occlusion-based importance measures, such as Leave-One-Out (LOO), tend to yield more faithful explanations compared to gradient-based methods within the FMM framework.
Implications and Future Directions
The introduction of FMMs holds significant implications for the deployment of NLP models in real-world applications where interpretability and reliability are paramount. By integrating faithfulness assessment directly into the model's architecture, the need for computationally intensive retraining with surrogate models is eliminated, enhancing the practicality and scalability of NLP interpretability research.
Theoretically, this approach offers new insights into the adaptability of over-parameterized models like MLMs to incorporate complex learning tasks like distributional transformations inherent in token masking. On a practical level, it allows practitioners to confidently rely on importance measures for model auditing and debugging, augmenting broader efforts toward developing trustworthy AI.
This research paves the way for extending the concept of faithfulness measurable models to different architectures beyond MLMs. Future work could explore its adaptation to other tasks, such as generation tasks in LLMs, and evaluate its utility in legal and ethical contexts where explanation transparency is crucial.