Precise measurement of human perplexity in next-token prediction
Develop a precise methodology to measure human perplexity—i.e., the cross-entropy of human next-token predictive distributions—on web text next-token prediction tasks such as OpenWebText tokenized with a 50,000-token vocabulary, so that human perplexity can be estimated accurately rather than indirectly via pairwise probability ratios and assumptions about calibration.
Sponsor
References
Thus, while we don’t have a good way to precisely measure human perplexity, these results give reasonable evidence that it is high.
— Language models are better than humans at next-token prediction
(2212.11281 - Shlegeris et al., 2022) in Section 4.2 (Results) of Measuring human perplexity