Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving the Generation Quality of Watermarked Large Language Models via Word Importance Scoring (2311.09668v1)

Published 16 Nov 2023 in cs.CL, cs.CR, and cs.LG

Abstract: The strong general capabilities of LLMs bring potential ethical risks if they are unrestrictedly accessible to malicious users. Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions with a private random number generator seeded by its prefix tokens. However, this watermarking algorithm alters the logits during generation, which can lead to a downgraded text quality if it chooses to promote tokens that are less relevant given the input. In this work, we propose to improve the quality of texts generated by a watermarked LLM by Watermarking with Importance Scoring (WIS). At each generation step, we estimate the importance of the token to generate, and prevent it from being impacted by watermarking if it is important for the semantic correctness of the output. We further propose three methods to predict importance scoring, including a perturbation-based method and two model-based methods. Empirical experiments show that our method can generate texts with better quality with comparable level of detection rate.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yuhang Li (102 papers)
  2. Yihan Wang (65 papers)
  3. Zhouxing Shi (19 papers)
  4. Cho-Jui Hsieh (211 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.