Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Bit Distortion-Free Watermarking for Large Language Models (2402.16578v1)

Published 26 Feb 2024 in cs.CL and cs.LG

Abstract: Methods for watermarking LLMs have been proposed that distinguish AI-generated text from human-generated text by slightly altering the model output distribution, but they also distort the quality of the text, exposing the watermark to adversarial detection. More recently, distortion-free watermarking methods were proposed that require a secret key to detect the watermark. The prior methods generally embed zero-bit watermarks that do not provide additional information beyond tagging a text as being AI-generated. We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark. We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Aaronson, S. My AI Safety Lecture for UT Effective Altruism., Nov. 2023. URL https://scottaaronson.blog/?p=6823. Accessed May 5, 2023.
  2. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP), pp. 121–140. IEEE, 2021.
  3. Real or fake? learning to discriminate machine from human generated text. arXiv preprint arXiv:1906.03351, 2019.
  4. Beresneva, D. Computer-generated text detection using machine learning: A systematic review. In Natural Language Processing and Information Systems: 21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016, Salford, UK, June 22-24, 2016, Proceedings 21, pp. 421–426. Springer, 2016.
  5. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. ChatEval: Towards better LLM-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201, 2023.
  8. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  9. Undetectable Watermarks for Language Models. 2023. URL http://arxiv.org/abs/2306.09194.
  10. Towards Next-Generation Intelligent Assistants Leveraging LLM Techniques. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.  5792–5793, 2023.
  11. TweepFake: About Detecting Deepfake Tweets. Plos one, 16(5):e0251415, 2021.
  12. Three bricks to consolidate watermarks for large language models. 2023.
  13. Openagi: When llm meets domain experts. arXiv preprint arXiv:2304.04370, 2023.
  14. Generative language models and automated influence operations: Emerging threats and potential mitigations. arXiv preprint arXiv:2301.04246, 2023.
  15. A Watermark for Large Language Models. 2023a. ISSN 26403498. URL http://arxiv.org/abs/2301.10226.
  16. On the Reliability of Watermarks for Large Language Models. 2023b. URL http://arxiv.org/abs/2306.04634.
  17. Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593, 2023.
  18. Open sesame! universal black box jailbreaking of large language models. arXiv preprint arXiv:2309.01446, 2023.
  19. Detecting fake content with relative entropy scoring. Pan, 8(27-31):4, 2008.
  20. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  21. A semantic invariant robust watermark for large language models. arXiv preprint arXiv:2310.06356, 2023.
  22. Augmented language models: A survey. arXiv preprint arXiv:2302.07842, 2023.
  23. Detectgpt: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305, 2023.
  24. OpenAI. Gpt-2: 1.5b release. Website, 2023. https://openai.com/research/gpt-2-1-5b-release/.
  25. Probability, Random Variables, and Stochastic Processes. McGraw-Hill series in electrical and computer engineering. McGraw-Hill, 2002. ISBN 9780071226615. URL https://books.google.com/books?id=k22UwAEACAAJ.
  26. HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face. arXiv preprint arXiv:2303.17580, 2023.
  27. Understanding the capabilities, limitations, and societal impact of large language models. arXiv preprint arXiv:2102.02503, 2021.
  28. Tian, E. Gptzero update v1. Website, 2023. https://gptzero.substack.com/p/ gptzero-update-v1.
  29. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  30. Chatgpt for robotics: Design principles and model abilities. Microsoft Auton. Syst. Robot. Res, 2:20, 2023.
  31. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2.
  32. Towards Codable Text Watermarking for Large Language Models. pp.  1–25, 2023. URL https://arxiv.org/abs/2307.15992v1.
  33. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671, 2023a.
  34. Next-GPT: Any-to-any multimodal LLM. arXiv preprint arXiv:2309.05519, 2023b.
  35. Anatomy of an AI-powered malicious social botnet. arXiv preprint arXiv:2307.16336, 2023.
  36. Natural language is all a graph needs. arXiv preprint arXiv:2308.07134, 2023.
  37. Robust multi-bit natural language watermarking through invariant features. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  2092–2115, 2023.
  38. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
Citations (5)

Summary

We haven't generated a summary for this paper yet.