Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
37 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
37 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
10 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Limits of Detecting Text Generated by Large-Scale Language Models (2002.03438v1)

Published 9 Feb 2020 in cs.CL, cs.CY, and cs.LG

Abstract: Some consider large-scale LLMs that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns. Here we formulate large-scale LLM output detection as a hypothesis testing problem to classify text as genuine or generated. We show that error exponents for particular LLMs are bounded in terms of their perplexity, a standard measure of language generation performance. Under the assumption that human language is stationary and ergodic, the formulation is extended from considering specific LLMs to considering maximum likelihood LLMs, among the class of k-order Markov approximations; error probabilities are characterized. Some discussion of incorporating semantic side information is also given.

Citations (16)

Summary

We haven't generated a summary for this paper yet.