WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models (2403.19548v1)
Abstract: Watermarking generative-AI systems, such as LLMs, has gained considerable interest, driven by their enhanced capabilities across a wide range of tasks. Although current approaches have demonstrated that small, context-dependent shifts in the word distributions can be used to apply and detect watermarks, there has been little work in analyzing the impact that these perturbations have on the quality of generated texts. Balancing high detectability with minimal performance degradation is crucial in terms of selecting the appropriate watermarking setting; therefore this paper proposes a simple analysis framework where comparative assessment, a flexible NLG evaluation framework, is used to assess the quality degradation caused by a particular watermark setting. We demonstrate that our framework provides easy visualization of the quality-detection trade-off of watermark settings, enabling a simple solution to find an LLM watermark operating point that provides a well-balanced performance. This approach is applied to two different summarization systems and a translation system, enabling cross-model analysis for a task, and cross-task analysis.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Watermarking conditional text generation for ai detection: Unveiling challenges and a semantic-aware watermark remedy. arXiv preprint arXiv:2307.13808.
- Xtreme: A massively multilingual multi-task benchmark for evaluating cross-lingual generalization. CoRR, abs/2003.11080.
- A watermark for large language models. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 17061–17084. PMLR.
- On the reliability of watermarks for large language models. arXiv preprint arXiv:2306.04634.
- Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
- Improving the generation quality of watermarked large language models via word importance scoring. arXiv preprint arXiv:2311.09668.
- A semantic invariant robust watermark for large language models. arXiv preprint arXiv:2310.06356.
- Zero-shot nlg evaluation through pairware comparisons with llms. arXiv preprint arXiv:2307.07889.
- Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807, Brussels, Belgium. Association for Computational Linguistics.
- Comet: A neural framework for mt evaluation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2685–2702.
- A robust semantics-based watermark for large language model against paraphrasing. arXiv preprint arXiv:2311.08721.
- Necessary and sufficient watermark for large language models. arXiv preprint arXiv:2310.00833.
- Multilingual translation with extensible multilingual pretraining and finetuning. arXiv preprint arXiv:2008.00401.
- Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944.
- Towards codable text watermarking for large language models. arXiv preprint arXiv:2307.15992.
- Perplexity from plm is unreliable for evaluating text quality. arXiv preprint arXiv:2210.05892.
- Robust multi-bit natural language watermarking through invariant features. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2092–2115.
- Provable robust watermarking for ai-generated text. arXiv preprint arXiv:2306.17439.
- Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685.
- Towards a unified multi-dimensional evaluator for text generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2023–2038, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Piotr Molenda (1 paper)
- Adian Liusie (20 papers)
- Mark J. F. Gales (37 papers)