Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Slaves to the Law of Large Numbers: An Asymptotic Equipartition Property for Perplexity in Generative Language Models (2405.13798v3)

Published 22 May 2024 in cs.CL, cs.AI, cs.IT, and math.IT

Abstract: We prove a new asymptotic equipartition property for the perplexity of long texts generated by a LLM and present supporting experimental evidence from open-source models. Specifically we show that the logarithmic perplexity of any large text generated by a LLM must asymptotically converge to the average entropy of its token distributions. This defines a "typical set" that all long synthetic texts generated by a LLM must belong to. We show that this typical set is a vanishingly small subset of all possible grammatically correct outputs. These results suggest possible applications to important practical problems such as (a) detecting synthetic AI-generated text, and (b) testing whether a text was used to train a LLM. We make no simplifying assumptions (such as stationarity) about the statistics of LLM outputs, and therefore our results are directly applicable to practical real-world models without any approximations.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com