Evaluation measure for faithfulness in QED Task 4
Develop an evaluation measure that quantifies the faithfulness of model-generated explanations to the underlying reasoning process in QED Task 4, where a system predicts a long answer span, a short answer span, and a structured QED explanation for a given question–document pair.
References
This will require an evaluation measure for faithfulness, which is an open question beyond the scope of this paper.
— QED: A Framework and Dataset for Explanations in Question Answering
(2009.06354 - Lamm et al., 2020) in Section 5.1, Task 4 (Four Tasks)