Generative Text Steganography with Large Language Model (2404.10229v2)
Abstract: Recent advances in LLMs have blurred the boundary of high-quality text generation between humans and machines, which is favorable for generative text steganography. While, current advanced steganographic mapping is not suitable for LLMs since most users are restricted to accessing only the black-box API or user interface of the LLMs, thereby lacking access to the training vocabulary and its sampling probabilities. In this paper, we explore a black-box generative text steganographic method based on the user interfaces of LLMs, which is called LLM-Stega. The main goal of LLM-Stega is that the secure covert communication between Alice (sender) and Bob (receiver) is conducted by using the user interfaces of LLMs. Specifically, We first construct a keyword set and design a new encrypted steganographic mapping to embed secret messages. Furthermore, to guarantee accurate extraction of secret messages and rich semantics of generated stego texts, an optimization mechanism based on reject sampling is proposed. Comprehensive experiments demonstrate that the proposed LLM-Stega outperforms current state-of-the-art methods.
- Digital audio steganography: Systematic review, classification, and analysis of the current state of the art. Computer Science Review 38 (2020), 100316.
- Ross J Anderson and Fabien AP Petitcolas. 1998. On the limits of steganography. IEEE Journal on selected areas in communications 16, 4 (1998), 474–481.
- Language models are realistic tabular data generators. arXiv preprint arXiv:2210.06280 (2022).
- Mixture of Soft Prompts for Controllable Data Generation. arXiv preprint arXiv:2303.01580 (2023).
- RelationPrompt: Leveraging prompts to generate synthetic data for zero-shot relation triplet extraction. arXiv preprint arXiv:2203.09101 (2022).
- Digital watermarking and steganography. Morgan kaufmann.
- Falcon Z Dai and Zheng Cai. 2019. Towards near-imperceptible steganographic text. arXiv preprint arXiv:1907.06679 (2019).
- Discop: Provably Secure Steganography in Practice Based on “Distribution Copies”. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 2238–2255.
- Generating steganographic text with LSTMs. arXiv preprint arXiv:1705.10742 (2017).
- Self-guided noise-free data generation for efficient zero-shot learning. In International Conference on Learning Representations (ICLR 2023).
- Deep learning semantic image synthesis: a novel method for unlimited capacity, high noise resistance coverless video steganography. Multimedia Tools and Applications 83, 6 (2024), 17047–17065.
- SmartSteganogaphy: Light-weight generative audio steganography model for smart embedding application. Journal of Network and Computer Applications 165 (2020), 102689.
- Generating steganographic image description by dynamic synonym substitution. Signal Processing 164 (2019), 193–201.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4221–4231.
- Large-capacity image steganography based on invertible neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10816–10825.
- A robust coverless video steganography based on maximum DC coefficients against video attacks. Multimedia Tools and Applications 83, 5 (2024), 13427–13461.
- Generating training data with language models: Towards zero-shot language understanding. Advances in Neural Information Processing Systems 35 (2022), 462–477.
- Recurrent neural network based language model.. In Interspeech, Vol. 2. Makuhari, 1045–1048.
- Real-time motion estimation based video steganography with preserved consistency and local optimality. Multimedia Tools and Applications (2024), 1–24.
- Large-capacity and flexible video steganography via invertible neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22606–22615.
- Cross-Modal Text Steganography Against Synonym Substitution-Based Text Attack. IEEE Signal Processing Letters 30 (2023), 299–303.
- Real-time text steganalysis based on multi-stage transfer learning. IEEE Signal Processing Letters 28 (2021), 1510–1514.
- Niels Provos and Peter Honeyman. 2003. Hide and seek: An introduction to steganography. IEEE security & privacy 1, 3 (2003), 32–44.
- A secure text steganography based on synonym substitution. In IEEE Conference Anthology. IEEE, 1–3.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019).
- Timo Schick and Hinrich Schütze. 2021. Generating datasets with pretrained language models. arXiv preprint arXiv:2104.07540 (2021).
- Mohammad Shirali-Shahreza. 2008. Text steganography by changing words spelling. In 2008 10th international conference on advanced communication technology, Vol. 3. IEEE, 1912–1913.
- Gustavus J Simmons. 1984. The prisoners’ problem and the subliminal channel. In Advances in Cryptology: Proceedings of Crypto 83. Springer, 51–67.
- StegaStyleGAN: Towards Generic and Practical Generative Image Steganography. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 240–248.
- Anthony Thompson. 2017. Kaggle. https://www.kaggle.com/snapcrack/all-the-news/data.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision. 322–330.
- LLsM: Generative Linguistic Steganography with Large Language Model. arXiv preprint arXiv:2401.15656 (2024).
- Convolutional neural network based text steganalysis. IEEE Signal Processing Letters 26, 3 (2019), 460–464.
- Audio steganography based on iterative adversarial attacks against convolutional neural networks. IEEE transactions on information forensics and security 15 (2020), 2282–2294.
- Linguistic steganalysis using the features derived from synonym frequency. Multimedia tools and applications 71 (2014), 1893–1911.
- Linguistic steganalysis via densely connected LSTM with feature pyramid. In Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security. 5–10.
- RNN-stega: Linguistic steganography based on recurrent neural networks. IEEE Transactions on Information Forensics and Security 14, 5 (2018), 1280–1295.
- VAE-Stega: linguistic steganography based on variational auto-encoder. IEEE Transactions on Information Forensics and Security 16 (2020), 880–895.
- High invisibility image steganography with wavelet transform and generative adversarial network. Expert Systems with Applications (2024), 123540.
- Zerogen: Efficient zero-shot learning via dataset generation. arXiv preprint arXiv:2202.07922 (2022).
- Cross: Diffusion model makes controllable, robust and secure image steganography. Advances in Neural Information Processing Systems 36 (2024).
- Large language model as attributed training data generator: A tale of diversity and bias. arXiv preprint arXiv:2306.15895 (2023).
- Provably Secure Generative Linguistic Steganography. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 3046–3055.
- Linguistic steganography based on adaptive probability distribution. IEEE Transactions on Dependable and Secure Computing 19, 5 (2021), 2982–2997.
- Neural Linguistic Steganography. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 1210–1215.
- Jiaxuan Wu (3 papers)
- Zhengxian Wu (7 papers)
- Yiming Xue (6 papers)
- Juan Wen (15 papers)
- Wanli Peng (9 papers)