Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Effect of Scaling, Retrieval Augmentation and Form on the Factual Consistency of Language Models (2311.01307v1)

Published 2 Nov 2023 in cs.CL

Abstract: LLMs make natural interfaces to factual knowledge, but their usefulness is limited by their tendency to deliver inconsistent answers to semantically equivalent questions. For example, a model might predict both "Anne Redpath passed away in Edinburgh." and "Anne Redpath's life ended in London." In this work, we identify potential causes of inconsistency and evaluate the effectiveness of two mitigation strategies: up-scaling and augmenting the LM with a retrieval corpus. Our results on the LLaMA and Atlas models show that both strategies reduce inconsistency while retrieval augmentation is considerably more efficient. We further consider and disentangle the consistency contributions of different components of Atlas. For all LMs evaluated we find that syntactical form and other evaluation task artifacts impact consistency. Taken together, our results provide a better understanding of the factors affecting the factual consistency of LLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Lovisa Hagström (8 papers)
  2. Denitsa Saynova (4 papers)
  3. Tobias Norlund (6 papers)
  4. Moa Johansson (20 papers)
  5. Richard Johansson (18 papers)
Citations (7)