Improving Factuality with Explicit Working Memory (2412.18069v1)

Published 24 Dec 2024 in cs.CL

Abstract: LLMs can generate factually inaccurate content, a problem known as hallucination. Recent works have built upon retrieved-augmented generation to improve factuality through iterative prompting but these methods are limited by the traditional RAG design. To address these challenges, we introduce EWE (Explicit Working Memory), a novel approach that enhances factuality in long-form text generation by integrating a working memory that receives real-time feedback from external resources. The memory is refreshed based on online fact-checking and retrieval feedback, allowing EWE to rectify false claims during the generation process and ensure more accurate and reliable outputs. Our experiments demonstrate that Ewe outperforms strong baselines on four fact-seeking long-form generation datasets, increasing the factuality metric, VeriScore, by 2 to 10 points absolute without sacrificing the helpfulness of the responses. Further analysis reveals that the design of rules for memory updates, configurations of memory units, and the quality of the retrieval datastore are crucial factors for influencing model performance.

PDF Abstract

Improving Factuality with Explicit Working Memory

The paper "Improving Factuality with Explicit Working Memory" authored by researchers from Meta FAIR explores a significant challenge in the deployment of LLMs—the issue of hallucination in generated content, where textual outputs can contain factually inaccurate information. The paper introduces Ewe (Explicit Working Memory), a novel framework designed to enhance the factuality of long-form text generation through the integration of an active working memory system that receives real-time feedback from external resources.

Methodological Advances

The central innovation of this research is the Ewe framework, which incorporates a working memory solution that continuously monitors and updates the factual accuracy of the generated text. This framework distinguishes itself from traditional retrieval-augmented generation (RAG) systems by allowing for real-time fact-checking and memory refreshing. Key aspects of Ewe include:

Memory Structure: The working memory in Ewe is populated with knowledge from relevant, trustworthy sources, encoding latent representations of retrieved passages relevant to the input prompt. This memory is dynamic, allowing for updates based on feedback from retrieval processes and online fact-checking.
Generation Process: During the text generation process, Ewe periodically pauses to inject corrections and refresh the contents of its working memory based on findings from external fact-checking and retrieval tasks. This system highlights the role of memory configuration—such as the design rules for memory updates and retrieval datastore quality—as pivotal elements impacting overall model performance.

Empirical Validation

The researchers provide comprehensive empirical evidence demonstrating that Ewe significantly outperforms existing baseline models in terms of factual accuracy across multiple datasets oriented toward fact-seeking long-form generation. Ewe outperforms strong baselines, increasing the factuality metric, VeriScore, by 2 to 10 points absolute, without diminishing the helpfulness of generated responses.

Implications and Future Directions

The implications of this paper are substantial for both the theoretical development of LLMs and their practical applications. By mitigating hallucinations, Ewe substantially increases the reliability of AI-generated text, enhancing its applicability in domains requiring high factual correctness, such as legal, educational, and technical writing.

Theoretically, this paper paves the way for future research into more sophisticated memory management strategies in LLMs. Future research could explore more granular approaches to memory update rules and the integration of more complex auxiliary models for fact-checking and data retrieval. Additionally, further exploration into the scalability of such memory-augmented systems and their potential to improve other performance metrics in LLMs could yield valuable insights.

In conclusion, "Improving Factuality with Explicit Working Memory" presents a compelling advancement in the ongoing endeavor to enhance the factual accuracy of LLMs. The incorporation of a dynamic, fact-checking-specific working memory within the context of text generation signifies a meaningful step forward in addressing one of the key limitations of current LLMs, potentially transforming their utility in real-world applications.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Mingda Chen (25 papers)
Yang Li (1140 papers)
Karthik Padthe (4 papers)
Rulin Shao (20 papers)
Alicia Sun (5 papers)
Luke Zettlemoyer (225 papers)
Gargi Gosh (4 papers)
Wen-tau Yih (84 papers)

Related Papers

Find Related Papers

Tweets

https://twitter.com/gargighosh/status/1874179531294335463

https://twitter.com/fly51fly/status/1871930229197025359

https://twitter.com/drewocarr/status/1876052825912086562

https://twitter.com/rohanpaul_ai/status/1876025772366053817

https://twitter.com/GptMaestro/status/1872110599402111344

YouTube

Show All Videos