Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback (2302.12813v3)

Published 24 Feb 2023 in cs.CL and cs.AI
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

Abstract: LLMs, such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e.g., task-oriented dialog and question answering. However, applying LLMs to real-world, mission-critical applications remains challenging mainly due to their tendency to generate hallucinations and their inability to use external knowledge. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules. Our system makes the LLM generate responses grounded in external knowledge, e.g., stored in task-specific databases. It also iteratively revises LLM prompts to improve model responses using feedback generated by utility functions, e.g., the factuality score of a LLM-generated response. The effectiveness of LLM-Augmenter is empirically validated on two types of scenarios, task-oriented dialog and open-domain question answering. LLM-Augmenter significantly reduces ChatGPT's hallucinations without sacrificing the fluency and informativeness of its responses. We make the source code and models publicly available.

An Overview of "Check Your Facts and Try Again: Improving LLMs with External Knowledge and Automated Feedback"

The presented paper details a system designed to enhance the reliability of LLMs in generating accurate and knowledge-grounded responses by utilizing external knowledge sources and iterative feedback mechanisms. The work primarily addresses the issue of hallucinations—factually incorrect assertions made by LLMs—by proposing an augmentation framework involving plug-and-play modules that steer black-box LLMs, such as ChatGPT, towards generating factual and coherent outputs.

Methodological Insights

The authors introduce an architecture comprising several components: Working Memory, Policy, Action Executor, and Utility. Each component serves a distinct purpose in maintaining the dialog state, managing evidence retrieval, and enhancing response quality.

  • Working Memory captures the conversation state, including user queries and fetched evidence, ensuring the system's responses are contextually relevant.
  • Policy directs the sequence of actions in a conversation, optimizing LLM performance through strategic engagement with external knowledge.
  • Action Executor executes tasks via its two facets: Knowledge Consolidator and Prompt Engine. The consolidator retrieves relevant data from databases or online resources, while the engine crafts prompts to elicit improved LLM outputs.
  • Utility provides feedback on the candidate responses’ factual correctness and suggests prompt modifications to enhance quality.

The system leverages external databases and iterative validation cycles to iteratively refine the information presented by the LLMs, thereby harmonizing fluency with factual accuracy.

Experimental Evaluation

The empirical validation focuses on two primary applications: task-oriented dialog and open-domain question answering. Through rigorous testing on the DSTC7 News Chat and DSTC11 Customer Service datasets, the authors demonstrate considerable gains in response utility (Knowledge F1 score), fluency, and coherence when the proposed framework is applied to augment ChatGPT. Specifically, the framework yielded substantial improvements in tasks that demand context integration from disparate information sources, as evidenced by the performance on OTT-QA, a benchmark featuring multi-hop reasoning challenges across various modalities.

The empirical results also underscore the significance of feedback mechanisms in reducing hallucination rates. Automated feedback loops assist LLMs in iterative self-correction, ultimately culminating in output that more closely aligns with verifiable knowledge.

Implications and Future Directions

This research holds substantial implications for the deployment of LLMs in real-world applications requiring precision, such as customer service, information retrieval, and automated content generation. By allowing LLMs to tap into dynamic, external knowledge repositories without losing their fluency or relevance, this framework paves the path for future work in scaling LLM capabilities to support a broader spectrum of knowledge-intensive tasks.

The potential avenues for further exploration include expanding the utility functions used in feedback loops to encompass dimensions such as ethical compliance and domain-specific accuracy. Additionally, refining the reinforcement learning approach for the Policy module promises enhancements in efficiency and adaptability, fostering an even more robust interaction between LLMs and auxiliary knowledge systems.

In summary, the paper presents a nuanced approach to fortifying LLM responses with factual substantiation through a hybrid methodology integrating structured feedback and external knowledge. This not only mitigates one of the critical limitations of LLMs but also accentuates their applicability in domain-specific contexts requiring stringent adherence to factual correctness.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Baolin Peng (72 papers)
  2. Michel Galley (50 papers)
  3. Pengcheng He (60 papers)
  4. Hao Cheng (190 papers)
  5. Yujia Xie (29 papers)
  6. Yu Hu (75 papers)
  7. Qiuyuan Huang (23 papers)
  8. Lars Liden (12 papers)
  9. Zhou Yu (206 papers)
  10. Weizhu Chen (128 papers)
  11. Jianfeng Gao (344 papers)
Citations (318)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com