Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Navigating Uncertainty: Optimizing API Dependency for Hallucination Reduction in Closed-Book Question Answering (2401.01780v1)

Published 3 Jan 2024 in cs.CL and cs.IR

Abstract: While LLMs (LLM) are able to accumulate and restore knowledge, they are still prone to hallucination. Especially when faced with factual questions, LLM cannot only rely on knowledge stored in parameters to guarantee truthful and correct answers. Augmenting these models with the ability to search on external information sources, such as the web, is a promising approach to ground knowledge to retrieve information. However, searching in a large collection of documents introduces additional computational/time costs. An optimal behavior would be to query external resources only when the LLM is not confident about answers. In this paper, we propose a new LLM able to self-estimate if it is able to answer directly or needs to request an external tool. We investigate a supervised approach by introducing a hallucination masking mechanism in which labels are generated using a close book question-answering task. In addition, we propose to leverage parameter-efficient fine-tuning techniques to train our model on a small amount of data. Our model directly provides answers for $78.2\%$ of the known queries and opts to search for $77.2\%$ of the unknown ones. This results in the API being utilized only $62\%$ of the time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Improving language models by retrieving from trillions of tokens. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 2206–2240. PMLR.
  2. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  3. Sparks of artificial general intelligence: Early experiments with gpt-4.
  4. Palm: Scaling language modeling with pathways.
  5. Looking for a needle in a haystack: A comprehensive study of hallucinations in neural machine translation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1059–1075, Dubrovnik, Croatia. Association for Computational Linguistics.
  6. Realm: Retrieval-augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org.
  7. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
  8. Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12).
  9. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada. Association for Computational Linguistics.
  10. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
  11. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6086–6096, Florence, Italy. Association for Computational Linguistics.
  12. Towards few-shot fact-checking via perplexity. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1971–1981, Online. Association for Computational Linguistics.
  13. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA. Curran Associates Inc.
  14. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models.
  15. Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.
  16. Rethinking search: Making domain experts out of dilettantes. SIGIR Forum, 55(1).
  17. Webgpt: Browser-assisted question-answering with human feedback.
  18. Adapterhub: A framework for adapting transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 46–54.
  19. Exploring the limits of transfer learning with a unified text-to-text transformer.
  20. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426, Online. Association for Computational Linguistics.
  21. Toolformer: Language models can teach themselves to use tools.
  22. Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage.
  23. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc.
  24. Lamda: Language models for dialog applications.
  25. Emergent abilities of large language models.
  26. Bartscore: Evaluating generated text as text generation. In Advances in Neural Information Processing Systems, volume 34, pages 27263–27277. Curran Associates, Inc.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Pierre Erbacher (9 papers)
  2. Louis Falissar (1 paper)
  3. Vincent Guigue (18 papers)
  4. Laure Soulier (39 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com