2000 character limit reached
    
  Excuse me, sir? Your language model is leaking (information) (2401.10360v1)
    Published 18 Jan 2024 in cs.CR and cs.LG
  
  Abstract: We introduce a cryptographic method to hide an arbitrary secret payload in the response of a LLM. A secret key is required to extract the payload from the model's response, and without the key it is provably impossible to distinguish between the responses of the original LLM and the LLM that hides a payload. In particular, the quality of generated text is not affected by the payload. Our approach extends a recent result of Christ, Gunn and Zamir (2023) who introduced an undetectable watermarking scheme for LLMs.
- Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP), pp. 121–140. IEEE, 2021.
- Berlekamp, E. R. Block coding with noiseless feedback. PhD thesis, Massachusetts Institute of Technology, 1964.
- On the possibilities of AI-generated text detection. arXiv preprint arXiv:2304.04736, 2023.
- Entropy in different text types. Digital Scholarship in the Humanities, 32(3):528–542, 2017.
- Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023.
- Cover, T. M. The role of feedback in communication. In Performance Limits in Communication Theory and Practice, pp. 225–235. Springer, 1988.
- Perfectly secure steganography using minimum entropy coupling. arXiv preprint arXiv:2210.14889, 2022.
- Upper and lower bounds on black-box steganography. Journal of Cryptology, 22:365–394, 2009.
- Maximal noise in interactive communication over erasure channels and channels with feedback. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pp. 11–20, 2015.
- Three bricks to consolidate watermarks for large language models. arXiv preprint arXiv:2308.00113, 2023.
- Entropy rate constancy in text. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 199–206, 2002.
- Gilbert, E. N. A comparison of signalling alphabets. The Bell system technical journal, 31(3):504–522, 1952.
- How to construct random functions. Journal of the ACM (JACM), 33(4):792–807, 1986.
- Hamming, R. W. Error detecting and error correcting codes. The Bell system technical journal, 29(2):147–160, 1950.
- A pseudorandom generator from any one-way function. SIAM Journal on Computing, 28(4):1364–1396, 1999.
- Provably secure steganography. IEEE Transactions on Computers, 58(5):662–676, 2008.
- Automatic detection of machine generated text: A critical survey. arXiv preprint arXiv:2011.01314, 2020.
- Justesen, J. Class of constructive asymptotically good algebraic codes. IEEE Transactions on information theory, 18(5):652–656, 1972.
- A watermark for large language models. CoRR, abs/2301.10226, 2023a. doi: 10.48550/arXiv.2301.10226. URL https://doi.org/10.48550/arXiv.2301.10226.
- A watermark for large language models. arXiv preprint arXiv:2301.10226, 2023b.
- Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. arXiv preprint arXiv:2303.13408, 2023.
- Deeptextmark: Deep learning based text watermarking for detection of large language model generated text. arXiv preprint arXiv:2305.05773, 2023.
- Natural language watermarking via paraphraser-based lexical substitution. Artificial Intelligence, pp. 103859, 2023.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Can AI-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
- Schulman, L. J. Deterministic coding for interactive communication. In Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pp. 747–756, 1993.
- Schulman, L. J. Coding for interactive communication. IEEE transactions on information theory, 42(6):1745–1756, 1996.
- Lexical richness and text length: An entropy-based perspective. Journal of Quantitative Linguistics, 29(1):62–79, 2022.
- Expander codes. IEEE transactions on Information Theory, 42(6):1710–1722, 1996.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Varshamov, R. R. Estimate of the number of signals in error correcting codes. Docklady Akad. Nauk, SSSR, 117:739–741, 1957.
- Towards codable text watermarking for large language models. arXiv preprint arXiv:2307.15992, 2023.
- Robust natural language watermarking through invariant features. arXiv preprint arXiv:2305.01904, 2023a.
- Advancing beyond identification: Multi-bit watermark for language models. arXiv preprint arXiv:2308.00221, 2023b.
- Watermarks in the sand: Impossibility of strong watermarking for generative models. arXiv preprint arXiv:2311.04378, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.