Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 70 tok/s

Gemini 2.5 Pro 55 tok/s Pro

GPT-5 Medium 14 tok/s Pro

GPT-5 High 14 tok/s Pro

GPT-4o 72 tok/s Pro

Kimi K2 191 tok/s Pro

GPT OSS 120B 449 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

With a Little Push, NLI Models can Robustly and Efficiently Predict Faithfulness (2305.16819v1)

Published 26 May 2023 in cs.CL

Abstract: Conditional LLMs still generate unfaithful output that is not supported by their input. These unfaithful generations jeopardize trust in real-world applications such as summarization or human-machine interaction, motivating a need for automatic faithfulness metrics. To implement such metrics, NLI models seem attractive, since they solve a strongly related task that comes with a wealth of prior research and data. But recent research suggests that NLI models require costly additional machinery to perform reliably across datasets, e.g., by running inference on a cartesian product of input and generated sentences, or supporting them with a question-generation/answering step. In this work we show that pure NLI models can outperform more complex metrics when combining task-adaptive data augmentation with robust inference procedures. We propose: (1) Augmenting NLI training data to adapt NL inferences to the specificities of faithfulness prediction in dialogue; (2) Making use of both entailment and contradiction probabilities in NLI, and (3) Using Monte-Carlo dropout during inference. Applied to the TRUE benchmark, which combines faithfulness datasets across diverse domains and tasks, our approach strongly improves a vanilla NLI model and significantly outperforms previous work, while showing favourable computational cost.

Citations (8)

View on Semantic Scholar