Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Unforgeable Publicly Verifiable Watermark for Large Language Models (2307.16230v7)

Published 30 Jul 2023 in cs.CL

Abstract: Recently, text watermarking algorithms for LLMs have been proposed to mitigate the potential harms of text generated by LLMs, including fake news and copyright issues. However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To address this limitation, we propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages. Meanwhile, the token embedding parameters are shared between the generation and detection networks, which makes the detection network achieve a high accuracy very efficiently. Experiments demonstrate that our algorithm attains high detection accuracy and computational efficiency through neural networks. Subsequent analysis confirms the high complexity involved in forging the watermark from the detection network. Our code is available at \href{https://github.com/THU-BPM/unforgeable_watermark}{https://github.com/THU-BPM/unforgeable\_watermark}. Additionally, our algorithm could also be accessed through MarkLLM \citep{pan2024markLLM} \footnote{https://github.com/THU-BPM/MarkLLM}.

Overview of "An Unforgeable Publicly Verifiable Watermark for LLMs"

The paper "An Unforgeable Publicly Verifiable Watermark for LLMs" addresses the critical issue of verifying the authenticity of text generated by LLMs without compromising security. Given the proliferation of LLMs such as GPT-4 and their increasing use across multiple domains, the paper highlights potential risks like generating false information and copyright infringement. The work introduces a new watermarking approach called UPV, designed to embed an unforgeable publicly verifiable watermark into text generated by LLMs.

Methodology

The main innovation of this paper is the separation of the watermark generation and detection processes using distinct neural networks. Unlike existing models that require a shared key for both operations, UPV embeds small watermark signals directly into LLM's logits during text generation. This novel approach effectively increases security and prevents counterfeiting, particularly in third-party detection scenarios.

The watermark generation network employs a window of ww tokens to predict watermark signals for the last token in the sequence. This network leverages shared token embedding parameters between the watermark generator and detector, which provides prior information to the detection mechanism. This setup facilitates the creation of highly accurate detection with limited computational resources.

For the detection process, UPV employs a neural network-based system that evaluates entire text sequences to determine the presence of watermarks. This detector functions as a binary classifier, effectively leveraging the token embeddings initialized by the generation network.

Experimental Results

In evaluating the proposed method, experiments demonstrated that UPV achieves near-optimal performance, mirroring traditional key-based watermark detection with an F1 score approaching 99%. The computational overhead introduced by the watermarking process is minimal, making the solution highly efficient compared to the computational demands of LLMs.

The paper presents an impressive suite of results, including robustness to various attack vectors. The watermark detection network was notably resistant to efforts aimed at extracting watermark generation rules, a feature attributed to its computational asymmetry. Forging attacks that sought to reverse-engineer the watermarking process using the detector as a guide proved essentially ineffective.

Implications and Future Directions

The ability to embed an unforgeable watermark in LLM outputs has numerous implications for both the theoretical understanding and practical deployment of these models. Practically, this mechanism advances the effort to curb misuse of AI-generated text, enhancing trust in systems deploying LLMs. Theoretically, this paper paves the way for further exploration into watermarking algorithms which focus on balancing security, detection efficiency, and robustness to various potential attacks.

Future research could explore integrating these watermarking systems with broader AI safety frameworks and refining techniques to ensure watermark robustness across different text modification schemes, such as adversarial text transformations or sophisticated rewriting strategies. Additionally, extending this work to cover multimodal models that generate text interwoven with other media forms can elucidate broader applications of watermarking techniques in AI-generated content.

This paper contributes significantly to the discourse on securing AI-generated content by introducing mechanisms that facilitate verifying the authenticity of such content without exposing sensitive watermarking details. The UPV approach serves as a strong foundation for future innovations in digital watermarking within the AI domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP), pp.  121–140. IEEE, 2021.
  2. On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736, 2023.
  3. A pathway towards responsible ai generated content. arXiv preprint arXiv:2303.01325, 2023.
  4. Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023.
  5. No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672, 2022.
  6. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  7. Radar: Robust ai-text detection via adversarial learning. arXiv preprint arXiv:2307.03838, 2023.
  8. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  9. A watermark for large language models. arXiv preprint arXiv:2301.10226, 2023.
  10. Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593, 2023.
  11. Who wrote this code? watermarking for code generation. arXiv preprint arXiv:2305.15060, 2023.
  12. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
  13. Roberta: a robustly optimized bert pretraining approach (2019). arXiv preprint arXiv:1907.11692, 364, 1907.
  14. Results of the wmt14 metrics shared task. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pp.  293–301, 2014.
  15. Smaller language models are better black-box machine-generated text detectors. arXiv preprint arXiv:2305.09859, 2023.
  16. OpenAI. Gpt-4 technical report, 2023.
  17. On the risk of misinformation pollution with large language models. arXiv preprint arXiv:2305.13661, 2023.
  18. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  19. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  20. Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
  21. Detectllm: Leveraging log rank information for zero-shot detection of machine-generated text. arXiv preprint arXiv:2306.05540, 2023.
  22. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  23. Robust multi-bit natural language watermarking through invariant features. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  2092–2115, 2023.
  24. G3detector: General gpt-generated text detector. arXiv preprint arXiv:2305.12680, 2023.
  25. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
  26. Provable robust watermarking for ai-generated text. arXiv preprint arXiv:2306.17439, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Aiwei Liu (42 papers)
  2. Leyi Pan (7 papers)
  3. Xuming Hu (120 papers)
  4. Shu'ang Li (7 papers)
  5. Lijie Wen (58 papers)
  6. Irwin King (170 papers)
  7. Philip S. Yu (592 papers)
Citations (22)
Github Logo Streamline Icon: https://streamlinehq.com