Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 149 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 112 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Excuse me, sir? Your language model is leaking (information) (2401.10360v1)

Published 18 Jan 2024 in cs.CR and cs.LG

Abstract: We introduce a cryptographic method to hide an arbitrary secret payload in the response of a LLM. A secret key is required to extract the payload from the model's response, and without the key it is provably impossible to distinguish between the responses of the original LLM and the LLM that hides a payload. In particular, the quality of generated text is not affected by the payload. Our approach extends a recent result of Christ, Gunn and Zamir (2023) who introduced an undetectable watermarking scheme for LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP), pp.  121–140. IEEE, 2021.
  2. Berlekamp, E. R. Block coding with noiseless feedback. PhD thesis, Massachusetts Institute of Technology, 1964.
  3. On the possibilities of AI-generated text detection. arXiv preprint arXiv:2304.04736, 2023.
  4. Entropy in different text types. Digital Scholarship in the Humanities, 32(3):528–542, 2017.
  5. Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023.
  6. Cover, T. M. The role of feedback in communication. In Performance Limits in Communication Theory and Practice, pp.  225–235. Springer, 1988.
  7. Perfectly secure steganography using minimum entropy coupling. arXiv preprint arXiv:2210.14889, 2022.
  8. Upper and lower bounds on black-box steganography. Journal of Cryptology, 22:365–394, 2009.
  9. Maximal noise in interactive communication over erasure channels and channels with feedback. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pp.  11–20, 2015.
  10. Three bricks to consolidate watermarks for large language models. arXiv preprint arXiv:2308.00113, 2023.
  11. Entropy rate constancy in text. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp.  199–206, 2002.
  12. Gilbert, E. N. A comparison of signalling alphabets. The Bell system technical journal, 31(3):504–522, 1952.
  13. How to construct random functions. Journal of the ACM (JACM), 33(4):792–807, 1986.
  14. Hamming, R. W. Error detecting and error correcting codes. The Bell system technical journal, 29(2):147–160, 1950.
  15. A pseudorandom generator from any one-way function. SIAM Journal on Computing, 28(4):1364–1396, 1999.
  16. Provably secure steganography. IEEE Transactions on Computers, 58(5):662–676, 2008.
  17. Automatic detection of machine generated text: A critical survey. arXiv preprint arXiv:2011.01314, 2020.
  18. Justesen, J. Class of constructive asymptotically good algebraic codes. IEEE Transactions on information theory, 18(5):652–656, 1972.
  19. A watermark for large language models. CoRR, abs/2301.10226, 2023a. doi: 10.48550/arXiv.2301.10226. URL https://doi.org/10.48550/arXiv.2301.10226.
  20. A watermark for large language models. arXiv preprint arXiv:2301.10226, 2023b.
  21. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. arXiv preprint arXiv:2303.13408, 2023.
  22. Deeptextmark: Deep learning based text watermarking for detection of large language model generated text. arXiv preprint arXiv:2305.05773, 2023.
  23. Natural language watermarking via paraphraser-based lexical substitution. Artificial Intelligence, pp.  103859, 2023.
  24. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  25. Can AI-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
  26. Schulman, L. J. Deterministic coding for interactive communication. In Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pp.  747–756, 1993.
  27. Schulman, L. J. Coding for interactive communication. IEEE transactions on information theory, 42(6):1745–1756, 1996.
  28. Lexical richness and text length: An entropy-based perspective. Journal of Quantitative Linguistics, 29(1):62–79, 2022.
  29. Expander codes. IEEE transactions on Information Theory, 42(6):1710–1722, 1996.
  30. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  31. Varshamov, R. R. Estimate of the number of signals in error correcting codes. Docklady Akad. Nauk, SSSR, 117:739–741, 1957.
  32. Towards codable text watermarking for large language models. arXiv preprint arXiv:2307.15992, 2023.
  33. Robust natural language watermarking through invariant features. arXiv preprint arXiv:2305.01904, 2023a.
  34. Advancing beyond identification: Multi-bit watermark for language models. arXiv preprint arXiv:2308.00221, 2023b.
  35. Watermarks in the sand: Impossibility of strong watermarking for generative models. arXiv preprint arXiv:2311.04378, 2023.
Citations (2)

Summary

  • The paper introduces a cryptographic method for embedding secret payloads in LLM outputs without altering their observable distribution.
  • It details a process that uses a secret key and dynamic error-correcting codes to reliably encode data, with payload size growing linearly with response length.
  • Empirical tests on models like GPT-2 and Llama 2 demonstrate the scheme's viability and highlight future directions for enhancing robustness against text edits.

Overview of Cryptographic Payload Embedding in LLMs

The paper "Excuse me, sir? Your LLM is leaking (information)" introduces a cryptographic methodology for embedding secret payloads in responses generated by LLMs, maintaining indistinguishability without compromising text quality. This approach extends previous work on undetectable watermarking, presenting a novel scheme to encode hidden payloads.

Key Contributions

The authors present a methodical process for embedding arbitrary data within LLM outputs using cryptography, ensuring the original distribution remains intact. This technique leverages a secret key to retrieve the hidden information, making it impossible to distinguish between payload-bearing and original responses without the key. The paper builds upon the work by Christ, Gunn, and Zamir (2023), expanding the scope from merely embedding watermarks to incorporating dynamic payloads.

Methodology and Results

The steganographic scheme involves several components:

  1. Setup and Encoding:
    • A secret key is generated during setup.
    • The Steg_k function utilizes this key to encode payloads within LLM responses.
    • The encoded payload is seamlessly integrated, leveraging the model's entropy without altering response distribution.
  2. Error-Correcting Codes with Feedback (ECC):
    • To ensure reliable payload decoding, the paper introduces a dynamic ECC with feedback. This method, adapted for binary to ternary symbol encoding, provides efficient error correction while leveraging feedback to enhance accuracy.
    • The scheme guarantees the retrieval of a substantial portion of the payload, with performance metrics indicating linear growth of hidden bits relative to response length.
  3. Empirical Validation:
    • Empirical evaluations, using models like GPT-2 and Llamma 2, demonstrate the practical viability of embedding payloads at a practicable scale. Results show a consistent linear relationship between response length and the number of encoded bits, affirming theoretical predictions.

Implications and Future Directions

The implications of this research are multifaceted:

  • Security and Privacy: By embedding metadata or session details covertly, it is feasible to track LLM usage or identify users without altering observable outputs, offering a tool for privacy-preserving monitoring.
  • Robustness Challenges: The paper identifies limitations regarding robustness to text edits, indicating an area ripe for further research. Efforts to make such schemes resistant to various types of modifications would enhance their applicability in dynamic textual environments.

Conclusion

This work demonstrates a significant advancement in the domain of LLMs through the integration of cryptographic payload embedding. By maintaining text indistinguishability and employing advanced error correction, it sets a groundwork for future research in AI security and information encoding. Future exploration could enhance robustness, optimize encoding rates, and broaden the understanding of practical vulnerabilities, paving the way for more secure and versatile AI deployments.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 tweets and received 22 likes.

Upgrade to Pro to view all of the tweets about this paper:

Youtube Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com