Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

38 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

138 1 166

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents (2404.10774v2)

Published 16 Apr 2024 in cs.CL and cs.AI

Abstract: Recognizing if LLM output can be grounded in evidence is central to many tasks in NLP: retrieval-augmented generation, summarization, document-grounded dialogue, and more. Current approaches to this kind of fact-checking are based on verifying each piece of a model generation against potential evidence using an LLM. However, this process can be very computationally expensive, requiring many calls to a model to check a single response. In this work, we show how to build small fact-checking models that have GPT-4-level performance but for 400x lower cost. We do this by constructing synthetic training data with GPT-4, which involves creating realistic yet challenging instances of factual errors via a structured generation procedure. Training on this data teaches models to check each fact in the claim and recognize synthesis of information across sentences. For evaluation, we unify datasets from recent work on fact-checking and grounding LLM generations into a new benchmark, LLM-AggreFact. Our best system MiniCheck-FT5 (770M parameters) outperforms all systems of comparable size and reaches GPT-4 accuracy. We release LLM-AggreFact, code for data synthesis, and models.

PDF HTML Abstract

Efficient and Effective Fact-Checking for Grounding LLM Generations

Introduction

LLMs hold remarkable capacities for generating fluent and contextually relevant text across a myriad of tasks including document summarization, dialogue generation, and more. Nevertheless, these models often falter by producing content that, while seemingly plausible, may not be factually corroborated by evidence — a phenomenon known as "hallucination." Addressing this challenge, especially in a scalable and cost-effective manner, remains of interest within the field of NLP.

The present work introduces an innovative methodology that significantly mitigates the computational and financial overhead involved in LLM-based fact-checking without compromising on performance quality. By crafting a novel synthetic dataset that mimics complex instances of factual inaccuracies and leveraging this dataset to train a smaller model architecture, the authors showcase a system, MiniCheck, that rivals the accuracy of GPT-4 while operating at 400 times lower cost.

Fact-Checking Model Integration

MiniCheck, the proposed system, exemplifies a notable leap in addressing the limitations of prior fact-checking approaches. At its core, MiniCheck employs a sophisticated training regimen using synthetic data that is purposefully designed to include a range of factual inaccuracies. This data simulates the multifaceted nature of errors LLMs might generate, from misinterpretations to outright factual mistakes, across sentences that demand multi-sentence reasoning for verification.

The structure of MiniCheck is grounded in the Flan-T5 architecture, enriched through fine-tuning on the synthetic dataset alongside tailoring to standard entailment tasks. This methodological choice ensures that MiniCheck not only grasps the nuances of LLM-generated text but also aligns with the broader entailment detection capabilities required for effective fact-checking.

LLM-AggreFact: A New Factual Evaluation Benchmark

To benchmark the proficiency of fact-checking models, including MiniCheck, the paper introduces LLM-AggreFact — a comprehensive dataset amalgamating various tasks that necessitate evidence grounding. This benchmark encompasses a diverse array of domains from healthcare to news, alongside a mixture of closed-book and grounded generation settings, offering a rigorous testing ground for fact-checking systems.

Evaluation on LLM-AggreFact reveals that MiniCheck outperforms previous systems by a significant margin in terms of balanced accuracy. Specifically, MiniCheck-FT5, with 770M parameters, showcases comparative accuracy to GPT-4 while being significantly more efficient in terms of both speed and cost.

Implications and Future Directions

The findings presented carry both practical and theoretical implications for the development and deployment of LLMs. Practically, MiniCheck offers a viable solution for integrating robust fact-checking mechanisms into LLM applications without incurring prohibitive costs. Theoretically, the use of synthetic data for training fact-checkers opens new avenues for model training, particularly in scenarios where error types are complex and diverse.

Speculatively, as LLMs continue to evolve, the role of efficient and effective fact-checking will undeniably become more critical. Future research may explore extending the MiniCheck approach to multilingual settings, addressing the challenge of multi-document reasoning for comprehensive fact-checking, and further optimizing the trade-off between model size, accuracy, and operational costs.

Conclusion

Through meticulous methodology, synthetic data generation, and comprehensive benchmarking, this work advances the state of fact-checking for LLM-generated content. MiniCheck demonstrates that precision in fact-checking can be achieved without the constraints of high computational costs, offering a forward-looking solution for researchers and practitioners aiming to enhance the reliability of LLM outputs across a spectrum of applications.

PDF Markdown Bookmark Chat (Pro)

References (73)

Authors (3)

Liyan Tang (12 papers)
Philippe Laban (40 papers)
Greg Durrett (117 papers)

Citations (46)

View on Semantic Scholar

GitHub

GitHub - Liyan06/MiniCheck (166 stars)

Tweets

https://twitter.com/LiyanTang4/status/1780618310440099919

https://twitter.com/gregd_nlp/status/1836465939161751914

https://twitter.com/LiyanTang4/status/1813294371560665377

https://twitter.com/getmaximai/status/1833879482739896323

https://twitter.com/knishimae0531/status/1780804717981184359

https://twitter.com/AIdenAIStar/status/1780644079702888598

YouTube

Show All Videos