Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Non-Parametric Uncertainty Quantification for Black-Box Large Language Models and Decision Planning (2402.00251v1)

Published 1 Feb 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Step-by-step decision planning with LLMs is gaining attention in AI agent development. This paper focuses on decision planning with uncertainty estimation to address the hallucination problem in LLMs. Existing approaches are either white-box or computationally demanding, limiting use of black-box proprietary LLMs within budgets. The paper's first contribution is a non-parametric uncertainty quantification method for LLMs, efficiently estimating point-wise dependencies between input-decision on the fly with a single inference, without access to token logits. This estimator informs the statistical interpretation of decision trustworthiness. The second contribution outlines a systematic design for a decision-making agent, generating actions like turn on the bathroom light'' based on user prompts such astake a bath''. Users will be asked to provide preferences when more than one action has high estimated point-wise dependencies. In conclusion, our uncertainty estimation and decision-making agent design offer a cost-efficient approach for AI agent development.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
  2. Rl4f: Generating natural language feedback with reinforcement learning for repairing model outputs. arXiv preprint arXiv:2305.08844, 2023.
  3. Llm in a flash: Efficient large language model inference with limited memory. arXiv preprint arXiv:2312.11514, 2023.
  4. Mutual information neural estimation. In International conference on machine learning, pp. 531–540. PMLR, 2018.
  5. Emergent autonomous scientific research capabilities of large language models. arXiv preprint arXiv:2304.05332, 2023.
  6. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
  7. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  8. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
  9. Ad-autogpt: An autonomous gpt for alzheimer’s disease infodemiology. arXiv preprint arXiv:2306.10095, 2023.
  10. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
  11. Unsupervised quality estimation for neural machine translation. Transactions of the Association for Computational Linguistics, 8:539–555, 2020.
  12. Retrieval augmented language model pre-training. In International conference on machine learning, pp. 3929–3938. PMLR, 2020.
  13. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
  14. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022.
  15. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299, 2022.
  16. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  17. Mixtral of experts, 2024.
  18. Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221, 2022.
  19. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. arXiv preprint arXiv:2302.09664, 2023.
  20. Conformal prediction with large language models for multi-choice question answering. arXiv preprint arXiv:2305.18404, 2023.
  21. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. Advances in neural information processing systems, 31, 2018.
  22. Camel: Communicative agents for” mind” exploration of large scale language model society. arXiv preprint arXiv:2303.17760, 2023.
  23. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems, 35:31199–31212, 2022.
  24. Generating with confidence: Uncertainty quantification for black-box large language models. arXiv preprint arXiv:2305.19187, 2023.
  25. Languages are rewards: Hindsight finetuning using human feedback. arXiv preprint arXiv:2302.02676, 2023a.
  26. Training socially aligned language models in simulated human society. arXiv preprint arXiv:2305.16960, 2023b.
  27. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842, 2023.
  28. Uncertainty estimation in autoregressive structured prediction. arXiv preprint arXiv:2002.07650, 2020.
  29. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.
  30. Gpt-4 technical report, 2023.
  31. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp.  1–22, 2023.
  32. Check your facts and try again: Improving large language models with external knowledge and automated feedback. arXiv preprint arXiv:2302.12813, 2023.
  33. Communicative agents for software development. arXiv preprint arXiv:2307.07924, 2023.
  34. Conformal language modeling. arXiv preprint arXiv:2306.10193, 2023.
  35. Robots that ask for help: Uncertainty alignment for large language model planners. arXiv preprint arXiv:2307.01928, 2023.
  36. Out-of-distribution detection and selective generation for conditional language models. arXiv preprint arXiv:2209.15558, 2022.
  37. Artificial intelligence a modern approach. London, 2010.
  38. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  39. Languagempc: Large language models as decision makers for autonomous driving. arXiv preprint arXiv:2310.03026, 2023.
  40. A tutorial on conformal prediction. Journal of Machine Learning Research, 9(3), 2008.
  41. Smith, R. C. Uncertainty quantification: theory, implementation, and applications, volume 12. Siam, 2013.
  42. Cognitive architectures for language agents. arXiv preprint arXiv:2309.02427, 2023.
  43. Mobilebert: a compact task-agnostic bert for resource-limited devices. arXiv preprint arXiv:2004.02984, 2020.
  44. Relevant and informative response generation using pointwise mutual information. In Proceedings of the First Workshop on NLP for Conversational AI, pp.  133–138, 2019.
  45. Large language models in medicine. Nature medicine, 29(8):1930–1940, 2023.
  46. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  47. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
  48. Neural methods for point-wise dependency estimation. Advances in Neural Information Processing Systems, 33:62–72, 2020.
  49. Self-supervised representation learning with relative predictive coding. arXiv preprint arXiv:2103.11275, 2021.
  50. Multimodal large language model for visual navigation. arXiv preprint arXiv:2310.08669, 2023.
  51. Mutual information alleviates hallucinations in abstractive summarization. arXiv preprint arXiv:2210.13210, 2022.
  52. Uncertainty estimation of transformer predictions for misclassification detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  8237–8252, 2022.
  53. Intelligent agents: Theory and practice. The knowledge engineering review, 10(2):115–152, 1995.
  54. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023.
  55. Auto-gpt for online decision making: Benchmarks and additional opinions. arXiv preprint arXiv:2306.02224, 2023.
  56. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  57. Detection of word adversarial examples in text classification: Benchmark and baseline via robust density estimation. arXiv preprint arXiv:2203.01677, 2022.
  58. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219, 2023.
  59. Navgpt: Explicit reasoning in vision-and-language navigation with large language models. arXiv preprint arXiv:2305.16986, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yao-Hung Hubert Tsai (41 papers)
  2. Walter Talbott (18 papers)
  3. Jian Zhang (543 papers)
Citations (2)