Teaching language models to support answers with verified quotes (2203.11147v1)

Published 21 Mar 2022 in cs.CL and cs.LG

Abstract: Recent LLMs often answer factual questions correctly. But users can't trust any given claim a model makes without fact-checking, because LLMs can hallucinate convincing nonsense. In this work we use reinforcement learning from human preferences (RLHP) to train "open-book" QA models that generate answers whilst also citing specific evidence for their claims, which aids in the appraisal of correctness. Supporting evidence is drawn from multiple documents found via a search engine, or from a single user-provided document. Our 280 billion parameter model, GopherCite, is able to produce answers with high quality supporting evidence and abstain from answering when unsure. We measure the performance of GopherCite by conducting human evaluation of answers to questions in a subset of the NaturalQuestions and ELI5 datasets. The model's response is found to be high-quality 80\% of the time on this Natural Questions subset, and 67\% of the time on the ELI5 subset. Abstaining from the third of questions for which it is most unsure improves performance to 90\% and 80\% respectively, approaching human baselines. However, analysis on the adversarial TruthfulQA dataset shows why citation is only one part of an overall strategy for safety and trustworthiness: not all claims supported by evidence are true.

Authors (11)

Jacob Menick (13 papers)
Maja Trebacz (9 papers)
Vladimir Mikulik (14 papers)
John Aslanides (16 papers)
Francis Song (10 papers)
Martin Chadwick (6 papers)
Mia Glaese (4 papers)
Susannah Young (5 papers)
Lucy Campbell-Gillingham (5 papers)
Geoffrey Irving (31 papers)
Nat McAleese (11 papers)

Citations (225)

View on Semantic Scholar

Summary

Teaching LLMs to Support Answers with Verified Quotes

This academic paper presents a significant advancement in the field of natural language processing by addressing the reliability challenges of LLMs in answering factual questions. The authors tackle the problem of LLMs generating unverified or incorrect information, referred to as "hallucinations," which poses a limitation for their widespread application. The research proposes an innovative method termed "Self-Supported Question Answering" (SQA) whereby LLMs not only provide answers but also support their answers with specific evidence drawn from authoritative sources. This methodology is implemented by leveraging reinforcement learning from human preferences (RLHP) to steer models towards more reliable outputs.

Core Contributions

Development of GopherCite Model: The paper introduces GopherCite, a 280 billion parameter model, which excels in generating answers accompanied by verified quotes from credible documents. GopherCite exhibits notable capabilities in providing supported claims and shows an improved ability to abstain from answering when confidence is low.
Empirical Evaluation: The effectiveness of GopherCite is demonstrated through rigorous human evaluations on datasets such as NaturalQuestions and ELI5. GopherCite's ability to produce high-quality supported answers reaches 80% on the NaturalQuestions subset and 67% on the ELI5 subset. When abstention is incorporated for uncertain answers, performance increases to 90% and 80% respectively.
Reinforcement Learning From Human Preferences (RLHP): The RLHP framework significantly enhances the model's performance by utilizing human feedback to guide the generation of answers that align with human judgment of plausibility and support.
Implications and Future Directions: The research underscores the potential of integrating evidence-based methodologies within LLMs for more reliable AI systems. However, the paper also cautions that citing sources is insufficient for ensuring truthfulness, highlighting the necessity for further enhancements in source trustworthiness and model alignment strategies.

Strong Numerical Results

The research delivers robust numerical outcomes as follows:

An 80% success rate in producing supported answers on the NaturalQuestions subset.
A 67% success rate on the ELI5 subset.
Notably, abstaining from the least confident third of questions boosts performance metrics significantly, underscoring the model's ability to identify its uncertainties effectively.

Implications for AI Development

The implications of this research extend beyond immediate improvements in question answering tasks. It signals a paradigm shift towards accountability in AI systems, where models can provide transparent evidence for their assertions, facilitating more trustworthy interactions in real-world applications. The integration of verifiable claims highlights a pathway for developing AI systems that users can rely upon with less skepticism and need for manual fact-checking.

Speculation on Future Developments

The future of AI could witness a more generalized adoption of RLHP approaches in developing AI models across diverse domains where trustworthiness and accuracy are paramount. Further research might focus on refining the fidelity of source verification processes and exploring ways to integrate complex reasoning abilities that enable models to evaluate and synthesize conflicting information from multiple sources.

Conclusion

This paper highlights the efficacy of GopherCite in providing supported answers through the innovative application of RLHP, thereby enhancing model reliability. The emphasis on evidence-backed responses represents a foundational step in fostering more credible and trustworthy AI-driven tools. The paper invites future exploration in elevating LLM transparency and accountability, marking a critical shift towards the development of robust AI systems capable of supporting their outputs with compelling evidence.

PDF Markdown

Related Papers

Find Related Papers