Papers
Topics
Authors
Recent
2000 character limit reached

Local Explanations and Self-Explanations for Assessing Faithfulness in black-box LLMs (2409.13764v1)

Published 18 Sep 2024 in cs.CL and cs.AI

Abstract: This paper introduces a novel task to assess the faithfulness of LLMs using local perturbations and self-explanations. Many LLMs often require additional context to answer certain questions correctly. For this purpose, we propose a new efficient alternative explainability technique, inspired by the commonly used leave-one-out approach. Using this approach, we identify the sufficient and necessary parts for the LLM to generate correct answers, serving as explanations. We propose a metric for assessing faithfulness that compares these crucial parts with the self-explanations of the model. Using the Natural Questions dataset, we validate our approach, demonstrating its effectiveness in explaining model decisions and assessing faithfulness.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.