Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Topic-Based Watermarks for LLM-Generated Text (2404.02138v3)

Published 2 Apr 2024 in cs.CR, cs.CL, and cs.LG

Abstract: The indistinguishability of text generated by LLMs from human-generated text poses significant challenges. Watermarking algorithms are potential solutions by embedding detectable signatures within LLM-generated outputs. However, current watermarking schemes lack robustness to a range of attacks such as text substitution or manipulation, undermining their reliability. This paper proposes a novel topic-based watermarking algorithm for LLMs, designed to enhance the robustness of watermarking in LLMs. Our approach leverages the topics extracted from input prompts or outputs of non-watermarked LLMs in the generation process of watermarked text. We dynamically utilize token lists on identified topics and adjust token sampling weights accordingly. By using these topic-specific token biases, we embed a topic-sensitive watermarking into the generated text. We outline the theoretical framework of our topic-based watermarking algorithm and discuss its potential advantages in various scenarios. Additionally, we explore a comprehensive range of attacks against watermarking algorithms, including discrete alterations, paraphrasing, and tokenizations. We demonstrate that our proposed watermarking scheme classifies various watermarked text topics with 99.99% confidence and outperforms existing algorithms in terms of z-score robustness and the feasibility of modeling text degradation by potential attackers, while considering the trade-offs between the benefits and losses of watermarking LLM-generated text.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Jonathan Bailey. 2007. Watermarking vs. Fingerprinting: A War in Terminology. (2007). https://www.plagiarismtoday.com/2007/10/09/watermarking-vs-fingerprinting-a-war-in-terminology/
  2. Undetectable Watermarks for Language Models. arXiv preprint arXiv:2306.09194 (2023). https://doi.org/10.48550/arXiv.2306.09194
  3. Saeed Dehqan. [n. d.]. Exploring Token Generation Strategies. https://www.packtpub.com/article-hub/exploring-token-generation-strategies
  4. Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing. CoRR abs/1902.02181 (2019). arXiv:1902.02181 http://arxiv.org/abs/1902.02181
  5. A Survey of Adversarial Defences and Robustness in NLP. arXiv preprint arXiv:2203.06414 (2023). https://doi.org/10.48550/arXiv.2203.06414
  6. Hugging Face Inc. 2024. Hugging Face Models. https://huggingface.co/models
  7. Robust fingerprinting of genomic databases. Bioinformatics 38, Supplement 1 (06 2022), i143–i152. https://doi.org/10.1093/bioinformatics/btac243
  8. Towards Robust Fingerprinting of Relational Databases by Mitigating Correlation Attacks. IEEE Transactions on Dependable and Secure Computing 20, 4 (2023), 2939–2953. https://doi.org/10.1109/TDSC.2022.3191117
  9. Robust Fingerprint of Location Trajectories Under Differential Privacy. arXiv preprint arXiv:2204.04792 (2022). https://doi.org/10.48550/arXiv.2204.04792
  10. Dinesh Kalla and Sivaraju Kuraku. 2023. Advantages, Disadvantages and Risks associated with ChatGPT and AI on Cybersecurity. (10 2023). https://doi.org/10.6084/m9.jetir.JETIR2310612
  11. A Watermark for Large Language Models. arXiv preprint arXiv:2301.10226 (2023). https://doi.org/10.48550/arXiv.2301.10226
  12. Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning. arXiv preprint arXiv:2402.14883 (2024). https://doi.org/10.48550/arXiv.2402.14883
  13. An Unforgeable Publicly Verifiable Watermark for Large Language Models. arXiv preprint arXiv:2307.16230 (2023). https://doi.org/10.48550/arXiv.2307.16230
  14. A Semantic Invariant Robust Watermark for Large Language Models. arXiv preprint arXiv:2310.06356 (2024). https://doi.org/10.48550/arXiv.2310.06356
  15. Madhurima Nath. 2023. Topic modeling algorithms. Medium (2023). https://medium.com/@m.nath/topic-modeling-algorithms-b7f97cec6005#:~:text=The%20most%20established%20go%2Dto,model%20which%20uses%20matrix%20factorization.
  16. OpenAI. 2024. Chatgpt: Optimizing language models for dialogue. https://openai.com/blog/chatgpt/
  17. Attacking LLM Watermarks by Exploiting Their Strengths. arXiv preprint arXiv:2402.16187 (2024). https://doi.org/10.48550/arXiv.2402.16187
  18. Partha P. Ray. 2023. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems 3 (2023), 121–154. https://doi.org/10.1016/j.iotcps.2023.04.003
  19. Risks and Benefits of Large Language Models for the Environment. Environmental Science & Technology 57, 9 (2023), 3464–3466. https://doi.org/10.1021/acs.est.3c01106
  20. Can AI-Generated Text be Reliably Detected? arXiv preprint arXiv:2303.11156 (2023). https://doi.org/10.48550/arXiv.2303.11156
  21. Vitalii Shevchuk. 2023. GPT-4 Parameters Explained: Everything You Need to Know. https://levelup.gitconnected.com/gpt-4-parameters-explained-everything-you-need-to-know-e210c20576ca
  22. Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey. arXiv preprint arXiv:2310.01424 (2023). https://doi.org/10.48550/arXiv.2310.01424
  23. Siddarth Srinivasan. 2024. Detecting AI fingerprints: A guide to watermarking and beyond. https://www.brookings.edu/articles/detecting-ai-fingerprints-a-guide-to-watermarking-and-beyond/
  24. The Science of Detecting LLM-Generated Texts. arXiv preprint arXiv:2303.07205 (2023). https://doi.org/10.48550/arXiv.2303.07205
  25. Attention Is All You Need. arXiv preprint arXiv:1706.03762 (2017). https://doi.org/10.48550/arXiv.1706.03762
  26. A Survey of Large Language Models. arXiv preprint arXiv:2303.18223v13 (2023). https://doi.org/10.48550/arXiv.2303.18223
  27. Provable Robust Watermarking for AI-Generated Text. arXiv preprint arXiv:2306.17439 (2023). https://doi.org/10.48550/arXiv.2306.17439
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Alexander Nemecek (5 papers)
  2. Yuzhou Jiang (7 papers)
  3. Erman Ayday (42 papers)
Citations (4)