Teaching Models to Express Their Uncertainty in Words (2205.14334v2)

Published 28 May 2022 in cs.CL, cs.AI, and cs.LG

Abstract: We show that a GPT-3 model can learn to express uncertainty about its own answers in natural language -- without use of model logits. When given a question, the model generates both an answer and a level of confidence (e.g. "90% confidence" or "high confidence"). These levels map to probabilities that are well calibrated. The model also remains moderately calibrated under distribution shift, and is sensitive to uncertainty in its own answers, rather than imitating human examples. To our knowledge, this is the first time a model has been shown to express calibrated uncertainty about its own answers in natural language. For testing calibration, we introduce the CalibratedMath suite of tasks. We compare the calibration of uncertainty expressed in words ("verbalized probability") to uncertainty extracted from model logits. Both kinds of uncertainty are capable of generalizing calibration under distribution shift. We also provide evidence that GPT-3's ability to generalize calibration depends on pre-trained latent representations that correlate with epistemic uncertainty over its answers.

Citations (295)

View on Semantic Scholar

Summary

The paper demonstrates that GPT-3 can output calibrated verbalized probabilities, transforming abstract logits into intuitive uncertainty expressions.
It introduces the CalibratedMath Suite, a set of arithmetic tasks designed to evaluate generalization across distribution shifts.
The research employs both supervised fine-tuning and stochastic few-shot prompting to reduce errors like MSE and MAD, improving model calibration.

Calibrated Verbal Expressions of Model Uncertainty

The paper "Teaching Models to Express Their Uncertainty in Words" provides a detailed exploration of how GPT-3, a LLM developed by OpenAI, can be trained to express uncertainty in a human-like manner through natural language, rather than relying solely on model logits. The authors introduce the concept of "verbalized probability," where the model outputs confidence levels about its answers, mapping these to well-calibrated probabilities even under distributional shifts. This initiative is driven by the growing need for transparency and reliability in AI systems, especially given the prevalence of model hallucinations when generating long-form text.

Core Contributions

Verbalized Probability: The primary thrust of this work is demonstrating that GPT-3 can express its internal uncertainty as verbalized probabilities. This approach seeks to align more closely with human expressions of epistemic uncertainty, in contrast to the token-level uncertainties captured via logits, which are more abstract and less intuitive for end users.
CalibratedMath Suite: The authors designed a suite of mathematical tasks to evaluate the model’s ability to generalize its calibration on new domains. This framework not only challenges GPT-3 with various arithmetic problems but also examines its adaptability across shifts in task difficulty and type.
Empirical Evaluation: The research includes empirical evaluation on distribution shifts, contrasting the verbalized probability approach against traditional methods of using logits. Notably, the verbalized method showed competent generalization across distribution shifts, specifically when compared to the baseline answer logit calibration performance.

Methodology and Metrics

The authors employed a robust methodological framework, enforcing a distinct train-test separation to evaluate the ability of models to generalize their calibration. Metrics used include mean squared error (MSE) and mean absolute deviation calibration error (MAD), focusing on how close the predicted probabilities are to the actual correctness of the model's answers. By leveraging both supervised fine-tuning and stochastic few-shot prompting, the paper explores diverse strategies to improve the model's calibration performance.

Implications and Future Directions

The implications of this research are multifaceted. On a practical level, improving a model's ability to express calibrated confidence can enhance user trust and engagement, particularly in high-stake applications where decision-making may rely on model outputs. Theoretically, this work contributes to ongoing discussions around AI alignment and interpretability, emphasizing the importance of models conveying honest, self-reflective assessments of their uncertainty.

The research opens several pathways for future exploration:

Expanding the domain of calibration beyond arithmetic tasks to more complex and varied subject areas.
Investigating reinforcement learning techniques to potentially yield more adaptable and robust confidence expressions.
Exploring the integration of this approach with other models and architectures to examine its generalizability and impact.

Overall, the paper provides significant insights into verbalized model uncertainty, highlighting the potential for LLMs to communicate uncertainty in a manner that is more interpretable and useful for human collaboration.

PDF Markdown

Related Papers

Tweets

https://twitter.com/curious_vii/status/1788279997725134992

https://twitter.com/anmorgan2414/status/1897693817081307336

https://twitter.com/AI__TECH/status/1834927363228156069

HackerNews

Teaching Models to Express Their Uncertainty in Words (2022) (7 points, 0 comments)