Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models (2402.18059v3)

Published 28 Feb 2024 in cs.LG, cs.CL, and cs.CR

Abstract: LLMs generate high-quality responses with potential misinformation, underscoring the need for regulation by distinguishing AI-generated and human-written texts. Watermarking is pivotal in this context, which involves embedding hidden markers in texts during the LLM inference phase, which is imperceptible to humans. Achieving both the detectability of inserted watermarks and the semantic quality of generated texts is challenging. While current watermarking algorithms have made promising progress in this direction, there remains significant scope for improvement. To address these challenges, we introduce a novel multi-objective optimization (MOO) approach for watermarking that utilizes lightweight networks to generate token-specific watermarking logits and splitting ratios. By leveraging MOO to optimize for both detection and semantic objective functions, our method simultaneously achieves detectability and semantic integrity. Experimental results show that our method outperforms current watermarking techniques in enhancing the detectability of texts generated by LLMs while maintaining their semantic coherence. Our code is available at https://github.com/mignonjia/TS_watermark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Aaronson, S. ‘reform’ ai alignment with scott aaronson. https://axrp.net/episode/2023/04/11/episode-20-reform-ai-alignment-scott-aaronson.html, 2023.
  2. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP), pp.  121–140. IEEE, 2021.
  3. Generative ai and the future of elections, 2023.
  4. Factuality challenges in the era of large language models. arXiv preprint arXiv:2310.05189, 2023.
  5. An estimate of an upper bound for the entropy of english. Computational Linguistics, 18(1):31–40, 1992.
  6. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  7. Composition-contrastive learning for sentence embeddings. arXiv preprint arXiv:2307.07380, 2023.
  8. X-mark: Towards lossless watermarking through lexical redundancy, 2023.
  9. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International conference on machine learning, pp.  794–803. PMLR, 2018.
  10. Lyapunov central limit theorem: Theoretical properties and applications in big-data-populated smart city settings. In Proceedings of the 2021 5th International Conference on Cloud and Big Data Computing, pp.  34–38, 2021.
  11. Multi-objective optimization. In Decision sciences, pp.  161–200. CRC Press, 2016.
  12. Désidéri, J.-A. Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique, 350(5-6):313–318, 2012.
  13. Feature-based detection of automated language models: tackling gpt-2, gpt-3 and grover. PeerJ Computer Science, 7:e443, 2021.
  14. On pushing deepfake tweet detection capabilities to the limits. In Proceedings of the 14th ACM Web Science Conference 2022, pp.  154–163, 2022.
  15. SimCSE: Simple contrastive learning of sentence embeddings. In Empirical Methods in Natural Language Processing (EMNLP), 2021.
  16. Gltr: Statistical detection and visualization of generated text. arXiv preprint arXiv:1906.04043, 2019.
  17. Pareto front estimation for decision making. Evolutionary computation, 22(4):651–678, 2014.
  18. The five-parameter logistic: a characterization and comparison with the four-parameter logistic. Analytical biochemistry, 343(1):54–65, 2005.
  19. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597, 2023.
  20. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), volume 2, pp.  1735–1742. IEEE, 2006.
  21. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pp.  1026–1034, 2015.
  22. Protecting intellectual property of language generation apis with lexical watermark. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  10758–10766, 2022a.
  23. Cater: Intellectual property protection on text generation apis via conditional watermarks. Advances in Neural Information Processing Systems, 35:5431–5445, 2022b.
  24. Automatic detection of generated text is easiest when humans are fooled. arXiv preprint arXiv:1911.00650, 2019.
  25. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rkE3y85ee.
  26. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  27. A watermark for large language models. In Proceedings of the 40th International Conference on Machine Learning, 2023a.
  28. On the reliability of watermarks for large language models. arXiv preprint arXiv:2306.04634, 2023b.
  29. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. arXiv preprint arXiv:2303.13408, 2023.
  30. Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593, 2023.
  31. Who wrote this code? watermarking for code generation. arXiv preprint arXiv:2305.15060, 2023.
  32. Origin tracing and detecting of llms. arXiv preprint arXiv:2304.14072, 2023.
  33. Gpt detectors are biased against non-native english writers. arXiv preprint arXiv:2304.02819, 2023.
  34. A semantic invariant robust watermark for large language models. arXiv preprint arXiv:2310.06356, 2023.
  35. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  36. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, pp.  3. Atlanta, GA, 2013.
  37. Multi-gradient descent for multi-objective recommender systems. arXiv preprint arXiv:2001.00846, 2019.
  38. Deeptextmark: Deep learning based text watermarking for detection of large language model generated text. arXiv preprint arXiv:2305.05773, 2023.
  39. OpenAI. New ai classifier for indicating ai-written text. https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text, 2023.
  40. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv e-prints, 2019.
  41. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  42. A robust semantics-based watermark for large language model against paraphrasing. arXiv preprint arXiv:2311.08721, 2023.
  43. Cross-domain detection of gpt-2-generated technical text. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  1213–1233, 2022.
  44. Chatgpt: Optimizing language models for dialogue. OpenAI blog, 2022.
  45. Multi-task learning as multi-objective optimization. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31, pp.  525–536. Curran Associates, Inc., 2018. URL http://papers.nips.cc/paper/7334-multi-task-learning-as-multi-objective-optimization.pdf.
  46. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203, 2019.
  47. Team, O. Chatgpt cheating scandal shocks florida high school. https://opendatascience.com/chatgpt-cheating-scandal-shocks-florida-high-school/, 2023.
  48. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  49. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  50. Towards codable text watermarking for large language models. arXiv preprint arXiv:2307.15992, 2023.
  51. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
  52. Attacking neural text detectors. arXiv preprint arXiv:2002.11768, 2020.
  53. Wouters, B. Optimizing watermarks for large language models, 2023.
  54. Llmdet: A third party large language models generated text detection tool. In Findings of the Association for Computational Linguistics: EMNLP 2023, pp.  2113–2133, 2023a.
  55. Large language models can be used to estimate the ideologies of politicians in a zero-shot learning setting. arXiv preprint arXiv:2303.12057, 2023b.
  56. Watermarking text generated by black-box language models. arXiv preprint arXiv:2305.08883, 2023.
  57. Robust multi-bit natural language watermarking through invariant features. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  2092–2115, 2023.
  58. Gpt paternity test: Gpt generated text detection with gpt genetic inheritance. arXiv preprint arXiv:2305.12519, 2023.
  59. Remark-llm: A robust and efficient watermarking framework for generative large language models. arXiv preprint arXiv:2310.12362, 2023.
  60. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
  61. Neural deepfake detection with factual structure of text. arXiv preprint arXiv:2010.07475, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Mingjia Huo (3 papers)
  2. Sai Ashish Somayajula (8 papers)
  3. Youwei Liang (16 papers)
  4. Ruisi Zhang (18 papers)
  5. Farinaz Koushanfar (85 papers)
  6. Pengtao Xie (86 papers)
Citations (8)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com