Papers
Topics
Authors
Recent
Search
2000 character limit reached

CLUE: Concept-Level Uncertainty Estimation for Large Language Models

Published 4 Sep 2024 in cs.CL and cs.LG | (2409.03021v1)

Abstract: LLMs have demonstrated remarkable proficiency in various natural language generation (NLG) tasks. Previous studies suggest that LLMs' generation process involves uncertainty. However, existing approaches to uncertainty estimation mainly focus on sequence-level uncertainty, overlooking individual pieces of information within sequences. These methods fall short in separately assessing the uncertainty of each component in a sequence. In response, we propose a novel framework for Concept-Level Uncertainty Estimation (CLUE) for LLMs. We leverage LLMs to convert output sequences into concept-level representations, breaking down sequences into individual concepts and measuring the uncertainty of each concept separately. We conduct experiments to demonstrate that CLUE can provide more interpretable uncertainty estimation results compared with sentence-level uncertainty, and could be a useful tool for various tasks such as hallucination detection and story generation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity.
  2. Language models are few-shot learners.
  3. Inside: Llms’ internal states retain the power of hallucination detection.
  4. Jiuhai Chen and Jonas Mueller. 2023. Quantifying uncertainty in answers from any language model and enhancing their trustworthiness.
  5. Hallucination detection: Robustly discerning reliable answers in large language models. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, page 245–255, New York, NY, USA. Association for Computing Machinery.
  6. Kcts: Knowledge-constrained tree search decoding with token-level hallucination detection.
  7. Dola: Decoding by contrasting layers improves factuality in large language models.
  8. Chain-of-verification reduces hallucination in large language models.
  9. Wenchao Du and Alan W Black. 2019. Boosting dialog response generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 38–43, Florence, Italy. Association for Computational Linguistics.
  10. Shifting attention to relevance: Towards the uncertainty estimation of large language models.
  11. Eli5: Long form question answering.
  12. Trapping llm hallucinations using tagged context prompts.
  13. Metric ensembles for hallucination detection.
  14. ELI5-Category: a categorized open-domain qa dataset.
  15. Content planning for neural story generation with aristotelian rescoring. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4319–4338, Online. Association for Computational Linguistics.
  16. Mitigating large language model hallucinations via autonomous knowledge graph-based retrofitting.
  17. Decomposing uncertainty for large language models through input clarification ensembling.
  18. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.
  19. Look before you leap: An exploratory study of uncertainty measurement for large language models.
  20. Towards mitigating LLM hallucination via self reflection. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 1827–1843, Singapore. Association for Computational Linguistics.
  21. Ever: Mitigating hallucination in large language models through real-time verification and rectification.
  22. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. In The Eleventh International Conference on Learning Representations.
  23. Diversity, density, and homogeneity: Quantitative characteristic metrics for text collections. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1739–1746, Marseille, France. European Language Resources Association.
  24. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
  25. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119, San Diego, California. Association for Computational Linguistics.
  26. Learning to trust your feelings: Leveraging self-awareness in llms for hallucination mitigation.
  27. Generating with confidence: Uncertainty quantification for black-box large language models.
  28. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models.
  29. Fine-grained hallucination detection and editing for language models.
  30. Self-contradictory hallucinations of large language models: Evaluation, detection and mitigation.
  31. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  32. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
  33. A survey of hallucination in large foundation models.
  34. Combining confidence elicitation and sample-based methods for uncertainty quantification in misinformation mitigation.
  35. Delucionqa: Detecting hallucinations in domain-specific question answering.
  36. Generating diverse translations with sentence codes. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1823–1827, Florence, Italy. Association for Computational Linguistics.
  37. Quantifying uncertainty in natural language explanations of large language models.
  38. Guy Tevet and Jonathan Berant. 2021. Evaluating the evaluation of diversity in natural language generation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 326–346, Online. Association for Computational Linguistics.
  39. Generating diverse story continuations with controllable semantics. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 44–58, Hong Kong. Association for Computational Linguistics.
  40. A stitch in time saves nine: Detecting and mitigating hallucinations of llms by validating low-confidence generation.
  41. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
  42. Hallucination detection for generative large language models by Bayesian sequential estimation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15361–15371, Singapore. Association for Computational Linguistics.
  43. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana. Association for Computational Linguistics.
  44. The next chapter: A study of large language models in storytelling. In Proceedings of the 16th International Natural Language Generation Conference, pages 323–351, Prague, Czechia. Association for Computational Linguistics.
  45. Uncertainty-aware language modeling for selective question answering.
  46. WikiQA: A challenge dataset for open-domain question answering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 2013–2018, Lisbon, Portugal. Association for Computational Linguistics.
  47. Improving the reliability of large language models by leveraging uncertainty-aware in-context learning.
  48. Plan-and-write: Towards better automatic storytelling.
  49. Cognitive mirage: A review of hallucinations in large language models.
  50. Bartscore: Evaluating generated text as text generation. In Advances in Neural Information Processing Systems, volume 34, pages 27263–27277. Curran Associates, Inc.
  51. Sac3: Reliable hallucination detection in black-box language models via semantic-aware cross-check consistency.
  52. Enhancing uncertainty-based hallucination detection with stronger focus.
  53. Siren’s song in the ai ocean: A survey on hallucination in large language models.
  54. Texygen: A benchmarking platform for text generation models.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.