Cross-Attention Watermarking of Large Language Models (2401.06829v1)
Abstract: A new approach to linguistic watermarking of LLMs is presented in which information is imperceptibly inserted into the output text while preserving its readability and original meaning. A cross-attention mechanism is used to embed watermarks in the text during inference. Two methods using cross-attention are presented that minimize the effect of watermarking on the performance of a pretrained model. Exploration of different training strategies for optimizing the watermarking and of the challenges and implications of applying this approach in real-world scenarios clarified the tradeoff between watermark robustness and text quality. Watermark selection substantially affects the generated output for high entropy sentences. This proactive watermarking approach has potential application in future model development.
- “A review of text watermarking: Theory, methods, and applications,” IEEE Access, vol. 6, pp. 8011–8028, 2018.
- “Generative language models and automated influence operations: Emerging threats and potential mitigations,” arXiv preprint arXiv:2301.04246, 2023.
- “Machine generated text: A comprehensive survey of threat models and detection methods,” arXiv preprint arXiv:2210.07321, 2022.
- “Chatgpt and the ai act,” Internet Policy Review, vol. 12, no. 1, 2023.
- “The curse of recursion: Training on generated data makes models forget,” arXiv preprint arxiv:2305.17493, 2023.
- “Can ai-generated text be reliably detected?,” arXiv preprint arXiv:2303.11156, 2023.
- “Deepfake text detection: Limitations and opportunities,” arXiv preprint arXiv:2210.09421, 2022.
- “Undetectable watermarks for language models,” Cryptology ePrint Archive, Paper 2023/763, 2023.
- “Natural language watermarking: Design, analysis, and a proof-of-concept implementation,” in International Workshop on Information Hiding. Springer, 2001, pp. 185–200.
- “Natural language watermarking and tamperproofing,” in International Workshop on Information Hiding. Springer, 2002, pp. 196–212.
- “The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions,” in Proceedings of the 8th workshop on Multimedia and security, 2006, pp. 164–174.
- “Reversible natural language watermarking using synonym substitution and arithmetic coding.,” Computers, Materials & Continua, vol. 55, no. 3, 2018.
- “Generating steganographic text with lstms,” arXiv preprint arXiv:1705.10742, 2017.
- “Tracing text provenance via context-aware lexical substitution,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, vol. 36, pp. 11613–11621.
- “Natural language watermarking via paraphraser-based lexical substitution,” Artificial Intelligence, vol. 317, pp. 103859, 2023.
- “Sentence-bert: Sentence embeddings using siamese bert-networks,” arXiv preprint arXiv:1908.10084, 2019.
- “Deeptextmark: Deep learning based text watermarking for detection of large language model generated text,” 2023.
- “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
- “Robust natural language watermarking through invariant features,” arXiv preprint arXiv:2305.01904, 2023.
- “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- “Adversarial watermarking transformer: Towards tracing text provenance with data hiding,” 2021.
- “Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing,” arXiv preprint arXiv:1808.06226, 2018.
- “A watermark for large language models,” 2023.
- “Flamingo: a visual language model for few-shot learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 23716–23736, 2022.
- “Visualgpt: Data-efficient adaptation of pretrained language models for image captioning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18030–18040.
- “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- “Openwebtext corpus,” 2019.
- “Categorical reparameterization with gumbel-softmax,” arXiv preprint arXiv:1611.01144, 2016.
- “LoRA: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2022.