Topic-Based Watermarks for LLM-Generated Text (2404.02138v3)

Published 2 Apr 2024 in cs.CR, cs.CL, and cs.LG

Abstract: The indistinguishability of text generated by LLMs from human-generated text poses significant challenges. Watermarking algorithms are potential solutions by embedding detectable signatures within LLM-generated outputs. However, current watermarking schemes lack robustness to a range of attacks such as text substitution or manipulation, undermining their reliability. This paper proposes a novel topic-based watermarking algorithm for LLMs, designed to enhance the robustness of watermarking in LLMs. Our approach leverages the topics extracted from input prompts or outputs of non-watermarked LLMs in the generation process of watermarked text. We dynamically utilize token lists on identified topics and adjust token sampling weights accordingly. By using these topic-specific token biases, we embed a topic-sensitive watermarking into the generated text. We outline the theoretical framework of our topic-based watermarking algorithm and discuss its potential advantages in various scenarios. Additionally, we explore a comprehensive range of attacks against watermarking algorithms, including discrete alterations, paraphrasing, and tokenizations. We demonstrate that our proposed watermarking scheme classifies various watermarked text topics with 99.99% confidence and outperforms existing algorithms in terms of z-score robustness and the feasibility of modeling text degradation by potential attackers, while considering the trade-offs between the benefits and losses of watermarking LLM-generated text.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (27)

Authors (3)

Alexander Nemecek (5 papers)
Yuzhou Jiang (7 papers)
Erman Ayday (42 papers)

Citations (4)

View on Semantic Scholar

Topic-Based Watermarks for LLM-Generated Text (2404.02138v3)

Related Papers