EvalxNLP: An Evaluation Framework for NLP Explainability
The paper "EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models" introduces the EvalxNLP framework, which addresses the increasingly critical need for effective evaluation of post-hoc explainability methods applied to NLP models. This is particularly relevant given the opacity of transformer-based models in high-stakes domains such as healthcare and finance, where model interpretability is crucial for trust and accountability.
EvalxNLP supports eight feature attribution methods, both gradient-based and perturbation-based, thus providing a thorough basis for evaluating and benchmarking transformer model explanations. Gradient-based approaches such as Integrated Gradients, Saliency, and DeepLIFT are implemented using Captum, while perturbation-based methods like LIME and SHAP are integrated directly or via existing libraries. These methods are assessed based on three pivotal properties: faithfulness, plausibility, and complexity. Faithfulness metrics include Soft sufficiency, Soft comprehensiveness, FAD N-AUC, and AUC-TP, ensuring that explanations accurately reflect model behavior. Plausibility is evaluated through metrics like IOU-F1 Score and AUPRC, while complexity is measured via Shannon entropy and sparsity metrics.
The framework further enhances comprehensibility by generating natural language explanations through an LLM-based module. This integration addresses the difficulty lay users face in interpreting raw importance scores, providing textual summaries of feature attributions and evaluation metrics. This utility is underscored by the human evaluation component, wherein user feedback indicated high satisfaction with the framework's usability and interpretability, particularly among those with less NLP experience.
EvalxNLP's case paper on sentiment analysis using Movie Reviews dataset exemplifies its capability to benchmark feature attribution methods across text classification tasks. Results demonstrated differential efficacy of explanation methods, highlighting DeepLIFT's superior faithfulness and SHAP's alignment with human intuition, albeit no single method dominates across all metrics. This underscores the necessity for selecting explainability techniques tailored to specific evaluation criteria pertinent to the user's context.
Despite the framework's accomplishments, limitations persist, particularly with respect to the scope restricted to text classification and feature attribution methods. Future developments could include support for diverse NLP tasks and incorporation of non-feature attribution methods, such as \cite{slalom-2025}, coupled with broader robustness metrics like sensitivity \cite{sensitivity-infidelity-2019}.
EvalxNLP represents a significant contribution to the arsenal of tools available for XAI (Explainable AI) in NLP, democratizing access and encouraging systematic improvements in model interpretability. Its extensibility allows researchers and developers to continually refine and adapt it, ensuring its relevance in advancing transparent and trustworthy AI systems.