Origin of LLM probability judgment incoherence in autoregression

Establish whether the mechanism by which GPT-4, GPT-3.5-turbo, LLaMA-2-70b, and LLaMA-2-7b generate probability judgments originates from the implementation of the autoregressive training objective used in these models.

Background

The paper shows that LLMs (GPT-4, GPT-3.5-turbo, LLaMA-2-70b, LLaMA-2-7b) produce probability judgments that are often incoherent when evaluated via probabilistic identities, exhibiting human-like deviations from zero. It also finds an inverted-U-shaped mean-variance relationship in repeated judgments, paralleling human data.

To explain these patterns, the authors compare two models of human probability judgment—Probability Theory plus Noise (PT+N) and the Bayesian Sampler—and argue that LLM responses align more closely with the Bayesian Sampler model. They outline a theoretical link between the autoregressive objective in LLMs and implicit Bayesian inference using de Finetti’s theorem, suggesting that the mechanism for forming probability judgments could be driven by autoregression.

The conjecture focuses on the causal origin of the observed incoherence patterns and posits that they arise from the autoregressive training objective. Resolving this would clarify whether the training objective itself induces the statistical structures observed in LLM probability judgments.

References

These structures offer insights into the underlying mechanisms employed by LLMs in the formation of probability judgments. We conjecture that this process originates from the implementation of autoregression for the four LLMs.

Incoherent Probability Judgments in Large Language Models (2401.16646 - Zhu et al., 30 Jan 2024) in Section 6 (Discussion)