Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Calibrating the Confidence of Large Language Models by Eliciting Fidelity (2404.02655v2)

Published 3 Apr 2024 in cs.CL

Abstract: LLMs optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these LLMs often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the LLM confidence into the \textit{Uncertainty} about the question and the \textit{Fidelity} to the answer generated by LLMs. Then, we propose a plug-and-play method to estimate the confidence of LLMs. Our method has shown good calibration performance by conducting experiments with 6 RLHF-LMs on four MCQA datasets. Moreover, we propose two novel metrics, IPR and CE, to evaluate the calibration of the model, and we have conducted a detailed discussion on \textit{Truly Well-Calibrated Confidence}. Our method could serve as a strong baseline, and we hope that this work will provide some insights into the model confidence calibration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. A general language assistant as a laboratory for alignment.
  2. Constitutional ai: Harmlessness from ai feedback.
  3. Language models are few-shot learners. In Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
  4. Sparks of artificial general intelligence: Early experiments with GPT-4. ArXiv preprint arXiv:2303.12712.
  5. Scaling instruction-finetuned language models.
  6. Think you have solved question answering? try arc, the ai2 reasoning challenge. ArXiv, abs/1803.05457.
  7. Wade Fagen-Ulmschneider. 2023. Perception of probability words. Ms., UIUC, 05-24-2023.
  8. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1321–1330. PMLR.
  9. Investigating uncertainty calibration of aligned language models under the multiple-choice setting.
  10. Measuring massive multitask language understanding. In International Conference on Learning Representations.
  11. How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9:962–977.
  12. Language models (mostly) know what they know.
  13. Semantic uncertainty: Linguistic invariances for uncertainty estimation in natural language generation. In The Eleventh International Conference on Learning Representations.
  14. Rlaif: Scaling reinforcement learning from human feedback with ai feedback.
  15. Teaching models to express their uncertainty in words. Transactions on Machine Learning Research.
  16. TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214–3252, Dublin, Ireland. Association for Computational Linguistics.
  17. OpenAI. 2023. Gpt-4 technical report.
  18. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
  19. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota. Association for Computational Linguistics.
  20. Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5433–5442, Singapore. Association for Computational Linguistics.
  21. Llama 2: Open foundation and fine-tuned chat models.
  22. Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms.
  23. Baichuan 2: Open large-scale language models.
  24. Alignment for honesty.
  25. Do large language models know what they don’t know? In Findings of the Association for Computational Linguistics: ACL 2023, pages 8653–8665, Toronto, Canada. Association for Computational Linguistics.
  26. Instruction tuning for large language models: A survey.
  27. Automatic calibration and error correction for generative large language models via pareto optimal self-supervision.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Mozhi Zhang (19 papers)
  2. Mianqiu Huang (5 papers)
  3. Rundong Shi (4 papers)
  4. Linsen Guo (1 paper)
  5. Chong Peng (41 papers)
  6. Peng Yan (88 papers)
  7. Yaqian Zhou (17 papers)
  8. Xipeng Qiu (257 papers)
Citations (7)