Self-Consistency of Large Language Models under Ambiguity (2310.13439v1)
Abstract: LLMs that do not give consistent answers across contexts are problematic when used for tasks with expectations of consistency, e.g., question-answering, explanations, etc. Our work presents an evaluation benchmark for self-consistency in cases of under-specification where two or more answers can be correct. We conduct a series of behavioral experiments on the OpenAI model suite using an ambiguous integer sequence completion task. We find that average consistency ranges from 67\% to 82\%, far higher than would be predicted if a model's consistency was random, and increases as model capability improves. Furthermore, we show that models tend to maintain self-consistency across a series of robustness checks, including prompting speaker changes and sequence length changes. These results suggest that self-consistency arises as an emergent capability without specifically training for it. Despite this, we find that models are uncalibrated when judging their own consistency, with models displaying both over- and under-confidence. We also propose a nonparametric test for determining from token output distribution whether a model assigns non-trivial probability to alternative answers. Using this test, we find that despite increases in self-consistency, models usually place significant weight on alternative, inconsistent answers. This distribution of probability mass provides evidence that even highly self-consistent models internally compute multiple possible responses.
- Discovering latent knowledge in language models without supervision. In The Eleventh International Conference on Learning Representations.
- Do models explain themselves? counterfactual simulatability of natural language explanations.
- Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Measuring and improving consistency in pretrained language models. Transactions of the Association for Computational Linguistics, 9:1012–1031.
- Evaluating superhuman models with consistency checks.
- Language models (mostly) know what they know.
- B. A. Levinstein and Daniel A. Herrmann. 2023. Still no lie detector for language models: Probing empirical and conceptual roadblocks.
- Teaching models to express their uncertainty in words. Transactions on Machine Learning Research.
- TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214–3252, Dublin, Ireland. Association for Computational Linguistics.
- We’re afraid language models aren’t modeling ambiguity.
- OpenAI. 2023. Gpt-4 technical report.
- Training language models to follow instructions with human feedback.
- Measuring reliability of large language models through semantic consistency. CoRR, abs/2211.05853.
- Whose opinions do language models reflect?
- Task ambiguity in humans and language models. In The Eleventh International Conference on Learning Representations.
- Language models don’t always say what they think: Unfaithful explanations in chain-of-thought prompting.
- Chain of thought prompting elicits reasoning in large language models. CoRR, abs/2201.11903.
- Do large language models know what they don’t know?
- Henning Bartsch (5 papers)
- Ole Jorgensen (2 papers)
- Domenic Rosati (22 papers)
- Jason Hoelscher-Obermaier (10 papers)
- Jacob Pfau (10 papers)