Lost in Overlap: Exploring Logit-based Watermark Collision in LLMs (2403.10020v3)
Abstract: The proliferation of LLMs in generating content raises concerns about text copyright. Watermarking methods, particularly logit-based approaches, embed imperceptible identifiers into text to address these challenges. However, the widespread usage of watermarking across diverse LLMs has led to an inevitable issue known as watermark collision during common tasks, such as paraphrasing or translation. In this paper, we introduce watermark collision as a novel and general philosophy for watermark attacks, aimed at enhancing attack performance on top of any other attacking methods. We also provide a comprehensive demonstration that watermark collision poses a threat to all logit-based watermark algorithms, impacting not only specific attack scenarios but also downstream applications.
- Sahar Abdelnabi and Mario Fritz. 2020. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. 2021 IEEE Symposium on Security and Privacy (SP), pages 121–140.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194.
- Unbiased watermark for large language models. arXiv preprint arXiv:2310.10669.
- A watermark for large language models. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 17061–17084. PMLR.
- On the reliability of watermarks for large language models. arXiv preprint arXiv:2306.04634.
- Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. arXiv preprint arXiv:2303.13408.
- Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593.
- A semantic invariant robust watermark for large language models. arXiv preprint arXiv:2310.06356.
- A survey of text watermarking in the era of large language models.
- Dissimilar: Towards fake news detection using information hiding, signal processing and machine learning. In Proceedings of the 16th International Conference on Availability, Reliability and Security, pages 1–9.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- Did you train on my dataset? towards public dataset protection with clean-label backdoor watermarking. arXiv preprint arXiv:2303.11470.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Tracing text provenance via context-aware lexical substitution. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11613–11621.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- Provable robust watermarking for ai-generated text. arXiv preprint arXiv:2306.17439.