Topic-Based Watermarks for Large Language Models
Abstract: The indistinguishability of LLM output from human-authored content poses significant challenges, raising concerns about potential misuse of AI-generated text and its influence on future AI model training. Watermarking algorithms offer a viable solution by embedding detectable signatures into generated text. However, existing watermarking methods often entail trade-offs among attack robustness, generation quality, and additional overhead such as specialized frameworks or complex integrations. We propose a lightweight, topic-guided watermarking scheme for LLMs that partitions the vocabulary into topic-aligned token subsets. Given an input prompt, the scheme selects a relevant topic-specific token list, effectively "green-listing" semantically aligned tokens to embed robust marks while preserving the text's fluency and coherence. Experimental results across multiple LLMs and state-of-the-art benchmarks demonstrate that our method achieves comparable perplexity to industry-leading systems, including Google's SynthID-Text, yet enhances watermark robustness against paraphrasing and lexical perturbation attacks while introducing minimal performance overhead. Our approach avoids reliance on additional mechanisms beyond standard text generation pipelines, facilitating straightforward adoption, suggesting a practical path toward globally consistent watermarking of AI-generated content.
- Jonathan Bailey. 2007. Watermarking vs. Fingerprinting: A War in Terminology. (2007). https://www.plagiarismtoday.com/2007/10/09/watermarking-vs-fingerprinting-a-war-in-terminology/
- Undetectable Watermarks for Language Models. arXiv preprint arXiv:2306.09194 (2023). https://doi.org/10.48550/arXiv.2306.09194
- Saeed Dehqan. [n. d.]. Exploring Token Generation Strategies. https://www.packtpub.com/article-hub/exploring-token-generation-strategies
- Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing. CoRR abs/1902.02181 (2019). arXiv:1902.02181 http://arxiv.org/abs/1902.02181
- A Survey of Adversarial Defences and Robustness in NLP. arXiv preprint arXiv:2203.06414 (2023). https://doi.org/10.48550/arXiv.2203.06414
- Hugging Face Inc. 2024. Hugging Face Models. https://huggingface.co/models
- Robust fingerprinting of genomic databases. Bioinformatics 38, Supplement 1 (06 2022), i143–i152. https://doi.org/10.1093/bioinformatics/btac243
- Towards Robust Fingerprinting of Relational Databases by Mitigating Correlation Attacks. IEEE Transactions on Dependable and Secure Computing 20, 4 (2023), 2939–2953. https://doi.org/10.1109/TDSC.2022.3191117
- Robust Fingerprint of Location Trajectories Under Differential Privacy. arXiv preprint arXiv:2204.04792 (2022). https://doi.org/10.48550/arXiv.2204.04792
- Dinesh Kalla and Sivaraju Kuraku. 2023. Advantages, Disadvantages and Risks associated with ChatGPT and AI on Cybersecurity. (10 2023). https://doi.org/10.6084/m9.jetir.JETIR2310612
- A Watermark for Large Language Models. arXiv preprint arXiv:2301.10226 (2023). https://doi.org/10.48550/arXiv.2301.10226
- Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning. arXiv preprint arXiv:2402.14883 (2024). https://doi.org/10.48550/arXiv.2402.14883
- An Unforgeable Publicly Verifiable Watermark for Large Language Models. arXiv preprint arXiv:2307.16230 (2023). https://doi.org/10.48550/arXiv.2307.16230
- A Semantic Invariant Robust Watermark for Large Language Models. arXiv preprint arXiv:2310.06356 (2024). https://doi.org/10.48550/arXiv.2310.06356
- Madhurima Nath. 2023. Topic modeling algorithms. Medium (2023). https://medium.com/@m.nath/topic-modeling-algorithms-b7f97cec6005#:~:text=The%20most%20established%20go%2Dto,model%20which%20uses%20matrix%20factorization.
- OpenAI. 2024. Chatgpt: Optimizing language models for dialogue. https://openai.com/blog/chatgpt/
- Attacking LLM Watermarks by Exploiting Their Strengths. arXiv preprint arXiv:2402.16187 (2024). https://doi.org/10.48550/arXiv.2402.16187
- Partha P. Ray. 2023. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems 3 (2023), 121–154. https://doi.org/10.1016/j.iotcps.2023.04.003
- Risks and Benefits of Large Language Models for the Environment. Environmental Science & Technology 57, 9 (2023), 3464–3466. https://doi.org/10.1021/acs.est.3c01106
- Can AI-Generated Text be Reliably Detected? arXiv preprint arXiv:2303.11156 (2023). https://doi.org/10.48550/arXiv.2303.11156
- Vitalii Shevchuk. 2023. GPT-4 Parameters Explained: Everything You Need to Know. https://levelup.gitconnected.com/gpt-4-parameters-explained-everything-you-need-to-know-e210c20576ca
- Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey. arXiv preprint arXiv:2310.01424 (2023). https://doi.org/10.48550/arXiv.2310.01424
- Siddarth Srinivasan. 2024. Detecting AI fingerprints: A guide to watermarking and beyond. https://www.brookings.edu/articles/detecting-ai-fingerprints-a-guide-to-watermarking-and-beyond/
- The Science of Detecting LLM-Generated Texts. arXiv preprint arXiv:2303.07205 (2023). https://doi.org/10.48550/arXiv.2303.07205
- Attention Is All You Need. arXiv preprint arXiv:1706.03762 (2017). https://doi.org/10.48550/arXiv.1706.03762
- A Survey of Large Language Models. arXiv preprint arXiv:2303.18223v13 (2023). https://doi.org/10.48550/arXiv.2303.18223
- Provable Robust Watermarking for AI-Generated Text. arXiv preprint arXiv:2306.17439 (2023). https://doi.org/10.48550/arXiv.2306.17439
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.