Papers
Topics
Authors
Recent
Search
2000 character limit reached

Topic-Based Watermarks for Large Language Models

Published 2 Apr 2024 in cs.CR, cs.CL, and cs.LG | (2404.02138v4)

Abstract: The indistinguishability of LLM output from human-authored content poses significant challenges, raising concerns about potential misuse of AI-generated text and its influence on future AI model training. Watermarking algorithms offer a viable solution by embedding detectable signatures into generated text. However, existing watermarking methods often entail trade-offs among attack robustness, generation quality, and additional overhead such as specialized frameworks or complex integrations. We propose a lightweight, topic-guided watermarking scheme for LLMs that partitions the vocabulary into topic-aligned token subsets. Given an input prompt, the scheme selects a relevant topic-specific token list, effectively "green-listing" semantically aligned tokens to embed robust marks while preserving the text's fluency and coherence. Experimental results across multiple LLMs and state-of-the-art benchmarks demonstrate that our method achieves comparable perplexity to industry-leading systems, including Google's SynthID-Text, yet enhances watermark robustness against paraphrasing and lexical perturbation attacks while introducing minimal performance overhead. Our approach avoids reliance on additional mechanisms beyond standard text generation pipelines, facilitating straightforward adoption, suggesting a practical path toward globally consistent watermarking of AI-generated content.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Jonathan Bailey. 2007. Watermarking vs. Fingerprinting: A War in Terminology. (2007). https://www.plagiarismtoday.com/2007/10/09/watermarking-vs-fingerprinting-a-war-in-terminology/
  2. Undetectable Watermarks for Language Models. arXiv preprint arXiv:2306.09194 (2023). https://doi.org/10.48550/arXiv.2306.09194
  3. Saeed Dehqan. [n. d.]. Exploring Token Generation Strategies. https://www.packtpub.com/article-hub/exploring-token-generation-strategies
  4. Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing. CoRR abs/1902.02181 (2019). arXiv:1902.02181 http://arxiv.org/abs/1902.02181
  5. A Survey of Adversarial Defences and Robustness in NLP. arXiv preprint arXiv:2203.06414 (2023). https://doi.org/10.48550/arXiv.2203.06414
  6. Hugging Face Inc. 2024. Hugging Face Models. https://huggingface.co/models
  7. Robust fingerprinting of genomic databases. Bioinformatics 38, Supplement 1 (06 2022), i143–i152. https://doi.org/10.1093/bioinformatics/btac243
  8. Towards Robust Fingerprinting of Relational Databases by Mitigating Correlation Attacks. IEEE Transactions on Dependable and Secure Computing 20, 4 (2023), 2939–2953. https://doi.org/10.1109/TDSC.2022.3191117
  9. Robust Fingerprint of Location Trajectories Under Differential Privacy. arXiv preprint arXiv:2204.04792 (2022). https://doi.org/10.48550/arXiv.2204.04792
  10. Dinesh Kalla and Sivaraju Kuraku. 2023. Advantages, Disadvantages and Risks associated with ChatGPT and AI on Cybersecurity. (10 2023). https://doi.org/10.6084/m9.jetir.JETIR2310612
  11. A Watermark for Large Language Models. arXiv preprint arXiv:2301.10226 (2023). https://doi.org/10.48550/arXiv.2301.10226
  12. Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning. arXiv preprint arXiv:2402.14883 (2024). https://doi.org/10.48550/arXiv.2402.14883
  13. An Unforgeable Publicly Verifiable Watermark for Large Language Models. arXiv preprint arXiv:2307.16230 (2023). https://doi.org/10.48550/arXiv.2307.16230
  14. A Semantic Invariant Robust Watermark for Large Language Models. arXiv preprint arXiv:2310.06356 (2024). https://doi.org/10.48550/arXiv.2310.06356
  15. Madhurima Nath. 2023. Topic modeling algorithms. Medium (2023). https://medium.com/@m.nath/topic-modeling-algorithms-b7f97cec6005#:~:text=The%20most%20established%20go%2Dto,model%20which%20uses%20matrix%20factorization.
  16. OpenAI. 2024. Chatgpt: Optimizing language models for dialogue. https://openai.com/blog/chatgpt/
  17. Attacking LLM Watermarks by Exploiting Their Strengths. arXiv preprint arXiv:2402.16187 (2024). https://doi.org/10.48550/arXiv.2402.16187
  18. Partha P. Ray. 2023. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems 3 (2023), 121–154. https://doi.org/10.1016/j.iotcps.2023.04.003
  19. Risks and Benefits of Large Language Models for the Environment. Environmental Science & Technology 57, 9 (2023), 3464–3466. https://doi.org/10.1021/acs.est.3c01106
  20. Can AI-Generated Text be Reliably Detected? arXiv preprint arXiv:2303.11156 (2023). https://doi.org/10.48550/arXiv.2303.11156
  21. Vitalii Shevchuk. 2023. GPT-4 Parameters Explained: Everything You Need to Know. https://levelup.gitconnected.com/gpt-4-parameters-explained-everything-you-need-to-know-e210c20576ca
  22. Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey. arXiv preprint arXiv:2310.01424 (2023). https://doi.org/10.48550/arXiv.2310.01424
  23. Siddarth Srinivasan. 2024. Detecting AI fingerprints: A guide to watermarking and beyond. https://www.brookings.edu/articles/detecting-ai-fingerprints-a-guide-to-watermarking-and-beyond/
  24. The Science of Detecting LLM-Generated Texts. arXiv preprint arXiv:2303.07205 (2023). https://doi.org/10.48550/arXiv.2303.07205
  25. Attention Is All You Need. arXiv preprint arXiv:1706.03762 (2017). https://doi.org/10.48550/arXiv.1706.03762
  26. A Survey of Large Language Models. arXiv preprint arXiv:2303.18223v13 (2023). https://doi.org/10.48550/arXiv.2303.18223
  27. Provable Robust Watermarking for AI-Generated Text. arXiv preprint arXiv:2306.17439 (2023). https://doi.org/10.48550/arXiv.2306.17439
Citations (4)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.