Universally Optimal Watermarking Schemes for LLMs: from Theory to Practice (2410.02890v2)

Published 3 Oct 2024 in cs.CR, cs.IT, cs.LG, and math.IT

Abstract: LLMs boosts human efficiency but also poses misuse risks, with watermarking serving as a reliable method to differentiate AI-generated content from human-created text. In this work, we propose a novel theoretical framework for watermarking LLMs. Particularly, we jointly optimize both the watermarking scheme and detector to maximize detection performance, while controlling the worst-case Type-I error and distortion in the watermarked text. Within our framework, we characterize the universally minimum Type-II error, showing a fundamental trade-off between detection performance and distortion. More importantly, we identify the optimal type of detectors and watermarking schemes. Building upon our theoretical analysis, we introduce a practical, model-agnostic and computationally efficient token-level watermarking algorithm that invokes a surrogate model and the Gumbel-max trick. Empirical results on Llama-13B and Mistral-8$\times$7B demonstrate the effectiveness of our method. Furthermore, we also explore how robustness can be integrated into our theoretical framework, which provides a foundation for designing future watermarking systems with improved resilience to adversarial attacks.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

Authors (5)

Haiyun He (8 papers)
Yepeng Liu (21 papers)
Ziqiao Wang (40 papers)
Yongyi Mao (45 papers)
Yuheng Bu (42 papers)

Universally Optimal Watermarking Schemes for LLMs: from Theory to Practice (2410.02890v2)

Related Papers