Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Semantic Invariant Robust Watermark for Large Language Models (2310.06356v3)

Published 10 Oct 2023 in cs.CR and cs.CL

Abstract: Watermark algorithms for LLMs have achieved extremely high accuracy in detecting text generated by LLMs. Such algorithms typically involve adding extra watermark logits to the LLM's logits at each generation step. However, prior algorithms face a trade-off between attack robustness and security robustness. This is because the watermark logits for a token are determined by a certain number of preceding tokens; a small number leads to low security robustness, while a large number results in insufficient attack robustness. In this work, we propose a semantic invariant watermarking method for LLMs that provides both attack robustness and security robustness. The watermark logits in our work are determined by the semantics of all preceding tokens. Specifically, we utilize another embedding LLM to generate semantic embeddings for all preceding tokens, and then these semantic embeddings are transformed into the watermark logits through our trained watermark model. Subsequent analyses and experiments demonstrated the attack robustness of our method in semantically invariant settings: synonym substitution and text paraphrasing settings. Finally, we also show that our watermark possesses adequate security robustness. Our code and data are available at \href{https://github.com/THU-BPM/Robust_Watermark}{https://github.com/THU-BPM/Robust\_Watermark}. Additionally, our algorithm could also be accessed through MarkLLM \citep{pan2024markLLM} \footnote{https://github.com/THU-BPM/MarkLLM}.

A Semantic Invariant Robust Watermark for LLMs

The development of LLMs brings substantial improvements in natural language processing tasks, but it also introduces challenges related to the misuse of machine-generated content. This paper presents a novel approach to watermarking LLMs, aiming to enhance both the attack robustness and security robustness of watermarking methods. By harnessing semantic information to generate watermark logits, the proposed method addresses the limitations of existing algorithms, which often involve a trade-off between robustness against attacks and inferential security.

Methodology Overview

The paper introduces a watermarking algorithm wherein the watermark logits for LLMs are informed by the semantic content of preceding tokens rather than merely their identity. This semantic-based approach employs an auxiliary embedding LLM to generate embeddings of preceding tokens, which are subsequently transformed into watermark logits via a trained watermark model. The innovation lies in utilizing semantically invariant features to both ensure the robustness of the watermark against text modifications such as synonym substitutions and paraphrasing, and to increase the security robustness against attacks attempting to deduce watermarking rules.

The watermark model is trained using similarity and normalization objectives. The similarity loss ensures that the similarity between watermark logits reflects the similarity of input text embeddings, while the normalization loss ensures that the generated logits have balanced scores with neutral mean values. These objectives collectively support the algorithm's robustness and security by bolstering the complexity and unbiased nature of watermark generation.

Experimental Results

The experimental results presented in the paper indicate that the proposed watermarking method offers strong resistance to semantically invariant text changes. The watermark consistently demonstrates high detection accuracy across multiple attack scenarios, including text paraphrasing and synonym replacement. Moreover, the results confirmed that the proposed method achieved a desirable balance between attack robustness and security robustness, as evidenced by its resistance to watermark decryption through frequency attacks.

Furthermore, the paper evaluates the computational efficiency and text quality impacts of the watermarking process. Although the watermarking introduces some latency during text generation, primarily in the embedding phase, parallelization effectively mitigates these delays. Importantly, the text quality, as measured by perplexity, is only slightly affected, suggesting the method's feasibility for real-world applications.

Implications and Future Directions

The semantic invariant robust watermarking approach outlined in this paper has significant implications for the future of LLM usage. Its ability to maintain watermark robustness across a variety of text modifications highlights its potential for ensuring content authenticity and traceability, critical aspects in fields concerned with copyright and misinformation. By focusing on semantic embeddings, the approach also opens pathways to explore multilingual and context-peered watermarks, which can be particularly valuable as LLMs become integrated into diverse and global contexts.

As advancements in AI continue, future developments could involve integrating more sophisticated embedding models or exploring dynamic watermarking techniques that adapt in real-time to evolving language patterns. Enhancements to the watermark model architecture could also leverage advances in neural network training techniques, potentially improving the balance between security and robustness further. Overall, the proposed semantic invariant robust watermarking method lays foundational groundwork for mitigating the risks associated with LLM-generated text, fostering a broader and more responsible deployment of AI technologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP), pp.  121–140. IEEE, 2021.
  2. Composition-contrastive learning for sentence embeddings. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  15836–15848, 2023.
  3. Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  5. Automatic typing of dbpedia entities. In The Semantic Web–ISWC 2012: 11th International Semantic Web Conference, Boston, MA, USA, November 11-15, 2012, Proceedings, Part I 11, pp.  65–81. Springer, 2012.
  6. A watermark for large language models. arXiv preprint arXiv:2301.10226, 2023.
  7. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. arXiv preprint arXiv:2303.13408, 2023.
  8. Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593, 2023.
  9. Who wrote this code? watermarking for code generation. arXiv preprint arXiv:2305.15060, 2023.
  10. A private watermark for large language models. arXiv preprint arXiv:2307.16230, 2023.
  11. Pointer sentinel mixture models. In International Conference on Learning Representations, 2016.
  12. George A Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
  13. Deeptextmark: Deep learning based text watermarking for detection of large language model generated text. arXiv preprint arXiv:2305.05773, 2023.
  14. Natural language watermarking via paraphraser-based lexical substitution. Artificial Intelligence, 317:103859, 2023.
  15. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  16. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2020. URL https://arxiv.org/abs/2004.09813.
  17. Risks and benefits of large language models for the environment. Environmental Science & Technology, 57(9):3464–3466, 2023.
  18. Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
  19. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  20. Towards codable text watermarking for large language models. arXiv preprint arXiv:2307.15992, 2023.
  21. Tracing text provenance via context-aware lexical substitution. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  11613–11621, 2022.
  22. Robust multi-bit natural language watermarking through invariant features. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  2092–2115, 2023.
  23. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
  24. Provable robust watermarking for ai-generated text. arXiv preprint arXiv:2306.17439, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Aiwei Liu (42 papers)
  2. Leyi Pan (7 papers)
  3. Xuming Hu (120 papers)
  4. Shiao Meng (5 papers)
  5. Lijie Wen (58 papers)
Citations (40)
Github Logo Streamline Icon: https://streamlinehq.com