Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 95 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 15 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 90 tok/s Pro
GPT OSS 120B 449 tok/s Pro
Kimi K2 192 tok/s Pro
2000 character limit reached

Graph-based Uncertainty Metrics for Long-form Language Model Outputs (2410.20783v1)

Published 28 Oct 2024 in cs.CL, cs.AI, and cs.LG

Abstract: Recent advancements in LLMs have significantly improved text generation capabilities, but these systems are still known to hallucinate, and granular uncertainty estimation for long-form LLM generations remains challenging. In this work, we propose Graph Uncertainty -- which represents the relationship between LLM generations and claims within them as a bipartite graph and estimates the claim-level uncertainty with a family of graph centrality metrics. Under this view, existing uncertainty estimation methods based on the concept of self-consistency can be viewed as using degree centrality as an uncertainty measure, and we show that more sophisticated alternatives such as closeness centrality provide consistent gains at claim-level uncertainty estimation. Moreover, we present uncertainty-aware decoding techniques that leverage both the graph structure and uncertainty estimates to improve the factuality of LLM generations by preserving only the most reliable claims. Compared to existing methods, our graph-based uncertainty metrics lead to an average of 6.8% relative gains on AUPRC across various long-form generation settings, and our end-to-end system provides consistent 2-4% gains in factuality over existing decoding techniques while significantly improving the informativeness of generated responses.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  2. OpenAI. Gpt-4 technical report, 2023.
  3. Anthropic. Claude 2, July 2023. URL https://www.anthropic.com/index/claude-2. Accessed: 2023-08-31.
  4. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  5. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  6. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958, 2021.
  7. How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9:962–977, 2021.
  8. Language models (mostly) know what they know, 2022.
  9. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation, 2023.
  10. Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms, 2023.
  11. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models, 2023.
  12. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation, 2023.
  13. Calibration of pre-trained transformers. arXiv preprint arXiv:2003.07892, 2020.
  14. Teaching models to express their uncertainty in words, 2022.
  15. Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback, 2023a.
  16. Language models with conformal factuality guarantees, 2024.
  17. Self-consistency improves chain of thought reasoning in language models, 2023.
  18. L Freeman. Centrality in networks: I. conceptual clarifications. social networks. Social Network, 10:0378–8733, 1979.
  19. Phillip Bonacich. Power and centrality: A family of measures. American journal of sociology, 92(5):1170–1182, 1987.
  20. Social network analysis: Methods and applications. 1994.
  21. Stephen P Borgatti. Centrality and network flow. Social networks, 27(1):55–71, 2005.
  22. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories, 2023.
  23. Fine-grained self-endorsement improves factuality and reasoning. arXiv preprint arXiv:2402.15631, 2024.
  24. Uncertainty estimation in autoregressive structured prediction, 2021.
  25. Calibrating language models via augmented prompt ensembles. 2023.
  26. Surface form competition: Why the highest probability answer isn’t always right. arXiv preprint arXiv:2104.08315, 2021.
  27. Generating with confidence: Uncertainty quantification for black-box large language models, 2023.
  28. Shifting attention to relevance: Towards the uncertainty estimation of large language models, 2023.
  29. Linguistic calibration of long-form generations, 2024. URL https://arxiv.org/abs/2404.00474.
  30. Improving language models by retrieving from trillions of tokens, 2022.
  31. Enhancing llm factual accuracy with rag to counter hallucinations: A case study on domain-specific queries in private knowledge-bases, 2024.
  32. Unsupervised improvement of factual knowledge in language models. In Andreas Vlachos and Isabelle Augenstein, editors, Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2960–2969, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.eacl-main.215. URL https://aclanthology.org/2023.eacl-main.215.
  33. Factuality enhanced language models for open-ended text generation, 2023.
  34. Fine-tuning language models for factuality, 2023b.
  35. Fine-tuning or retrieval? comparing knowledge injection in llms, 2024.
  36. Chain-of-verification reduces hallucination in large language models, 2023.
  37. Dola: Decoding by contrasting layers improves factuality in large language models, 2024.
  38. Natural questions: a benchmark for question answering research. Transactions of the Association of Computational Linguistics, 2019.
  39. Meta AI. Introducing meta llama 3: The most capable openly available llm to date. https://ai.meta.com/blog/meta-llama-3/, 2024. Accessed: 2024-05-13.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube