Advancing Beyond Identification: Multi-bit Watermark for Large Language Models (2308.00221v3)
Abstract: We show the viability of tackling misuses of LLMs beyond the identification of machine-generated text. While existing zero-bit watermark methods focus on detection only, some malicious misuses demand tracing the adversary user for counteracting them. To address this, we propose Multi-bit Watermark via Position Allocation, embedding traceable multi-bit information during LLM generation. Through allocating tokens onto different parts of the messages, we embed longer messages in high corruption settings without added latency. By independently embedding sub-units of messages, the proposed method outperforms the existing works in terms of robustness and latency. Leveraging the benefits of zero-bit watermarking, our method enables robust extraction of the watermark without any model access, embedding and extraction of long messages ($\geq$ 32-bit) without finetuning, and maintaining text quality, while allowing zero-bit detection all at the same time. Code is released here: https://github.com/bangawayoo/mb-lm-watermarking
- Watermarking gpt outputs. https://www.scottaaronson.com/talks/watermark.ppt, 2023. Accessed: 2023-09-14.
- Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP), pp. 121–140. IEEE, 2021.
- Palmer Annie. People are using a.i. chatbots to write amazon reviews. CNBC, 2023. URL https://www.cnbc.com/2023/04/25/amazon-reviews-are-being-written-by-ai-chatbots.html.
- Md Asikuzzaman and Mark R Pickering. An overview of digital video watermarking. IEEE Transactions on Circuits and Systems for Video Technology, 28(9):2131–2153, 2017.
- Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In International Workshop on Information Hiding, pp. 185–200. Springer, 2001.
- Analyzing the digital traces of political manipulation: The 2016 russian interference twitter campaign. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 258–265, 2018. doi: 10.1109/ASONAM.2018.8508646.
- Elwyn R Berlekamp. Block coding with noiseless feedback. PhD thesis, Massachusetts Institute of Technology, 1964.
- Digital image steganography: Survey and analysis of current methods. Signal processing, 90(3):727–752, 2010.
- Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023.
- Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999.
- Peter Elias. Error-correcting codes for list decoding. IEEE Transactions on Information Theory, 37(1):5–12, 1991.
- Generating steganographic text with lstms. In Proceedings of ACL 2017, Student Research Workshop, pp. 100–106, 2017.
- Three bricks to consolidate watermarks for large language models. arXiv preprint arXiv:2308.00113, 2023a.
- The stable signature: Rooting watermarks in latent diffusion models. arXiv preprint arXiv:2303.15435, 2023b.
- Philip Gage. A new algorithm for data compression. C Users Journal, 12(2):23–38, 1994.
- The Open Group. The open group base specifications issue 7, 2018 edition ieee std 1003.1™-2017 (revision of ieee std 1003.1-2008) copyright © 2001 2018 ieee and the open group. https://pubs.opengroup.org/onlinepubs/9699919799/, 2018. Accessed: 2023-09-14.
- Binary error-correcting codes with minimal noiseless feedback. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pp. 1475–1487, 2023.
- Venkatesan Guruswami. List decoding of error-correcting codes: winning thesis of the 2002 ACM doctoral dissertation competition, volume 3282. Springer Science & Business Media, 2004.
- Explicit codes achieving list decoding capacity: Error-correction with optimal redundancy. IEEE Transactions on information theory, 54(1):135–150, 2008.
- Protecting intellectual property of language generation apis with lexical watermark. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 10758–10766, 2022.
- Twenty years of digital audio watermarking—a comprehensive review. Signal processing, 128:222–242, 2016.
- A watermark for large language models. arXiv preprint arXiv:2301.10226, 2023a.
- On the reliability of watermarks for large language models. arXiv preprint arXiv:2306.04634, 2023b.
- Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. arXiv preprint arXiv:2303.13408, 2023.
- Who wrote this code? watermarking for code generation. arXiv preprint arXiv:2305.15060, 2023.
- Bruce Levin. A representation for multinomial cumulative distribution functions. The Annals of Statistics, pp. 1123–1126, 1981.
- Distortion agnostic deep watermarking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13548–13557, 2020.
- Pointer sentinel mixture models, 2016.
- Detectgpt: Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305, 2023.
- Deeptextmark: Deep learning based text watermarking for detection of large language model generated text. arXiv preprint arXiv:2305.05773, 2023.
- Propaganda and misinformation on facebook and twitter during the russian invasion of ukraine. In Proceedings of the 15th ACM Web Science Conference 2023, pp. 65–74, 2023.
- A survey of digital image watermarking techniques. In INDIN’05. 2005 3rd IEEE International Conference on Industrial Informatics, 2005., pp. 709–716. IEEE, 2005.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Christoph Schuhmann. Huggingface datasets: Christophschuhmann/essays-with-instructions. https://huggingface.co/datasets/ChristophSchuhmann/essays-with-instructions, 2022. Accessed: 2023-09-14.
- Claude Elwood Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
- Robust image watermarking theories and techniques: A review. Journal of applied research and technology, 12(1):122–138, 2014.
- Natural language watermarking. In Security, Steganography, and Watermarking of Multimedia Contents VII, volume 5681, pp. 441–452. SPIE, 2005.
- Natural language watermarking: Challenges in building a practical system. In Security, Steganography, and Watermarking of Multimedia Contents VIII, volume 6072, pp. 106–117. SPIE, 2006a.
- The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions. In Proceedings of the 8th workshop on Multimedia and security, pp. 164–174, 2006b.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- A downward spiral? a panel study of misinformation and media trust in chile. The International Journal of Press/Politics, 27(2):353–373, 2022.
- Towards codable text watermarking for large language models. arXiv preprint arXiv:2307.15992, 2023a.
- M4: Multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection. arXiv preprint arXiv:2305.14902, 2023b.
- Reed-Solomon codes and their applications. John Wiley & Sons, 1999.
- Paraphrastic representations at scale. In Proceedings of the The 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 379–388, 2022.
- Tracing text provenance via context-aware lexical substitution. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 11613–11621, 2022.
- Robust multi-bit natural language watermarking through invariant features. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2092–2115, 2023.
- A recipe for watermarking diffusion models. arXiv preprint arXiv:2303.10137, 2023.
- Hidden: Hiding data with deep networks. In Proceedings of the European conference on computer vision (ECCV), pp. 657–672, 2018.