Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Next-Generation Steganalysis: LLMs Unleash the Power of Detecting Steganography (2405.09090v1)

Published 15 May 2024 in cs.CR

Abstract: Linguistic steganography provides convenient implementation to hide messages, particularly with the emergence of AI generation technology. The potential abuse of this technology raises security concerns within societies, calling for powerful linguistic steganalysis to detect carrier containing steganographic messages. Existing methods are limited to finding distribution differences between steganographic texts and normal texts from the aspect of symbolic statistics. However, the distribution differences of both kinds of texts are hard to build precisely, which heavily hurts the detection ability of the existing methods in realistic scenarios. To seek a feasible way to construct practical steganalysis in real world, this paper propose to employ human-like text processing abilities of LLMs to realize the difference from the aspect of human perception, addition to traditional statistic aspect. Specifically, we systematically investigate the performance of LLMs in this task by modeling it as a generative paradigm, instead of traditional classification paradigm. Extensive experiment results reveal that generative LLMs exhibit significant advantages in linguistic steganalysis and demonstrate performance trends distinct from traditional approaches. Results also reveal that LLMs outperform existing baselines by a wide margin, and the domain-agnostic ability of LLMs makes it possible to train a generic steganalysis model (Both codes and trained models are openly available in https://github.com/ba0z1/Linguistic-Steganalysis-with-LLMs).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. R. Cogranne, Q. Giboulot, and P. Bas, “Efficient steganography in jpeg images by minimizing performance of optimal detector,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 1328–1343, 2022.
  2. M. Sharifzadeh, M. Aloraini, and D. Schonfeld, “Adaptive batch size image merging steganography and quantized gaussian image steganography,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 867–879, 2020.
  3. J. Butora and P. Bas, “Side-informed steganography for jpeg images by modeling decompressed images,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 2683–2695, 2023.
  4. X. Yi, K. Yang, X. Zhao, Y. Wang, and H. Yu, “Ahcm: Adaptive huffman code mapping for audio steganography based on psychoacoustic model,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 8, pp. 2217–2231, 2019.
  5. J. Wu, B. Chen, W. Luo, and Y. Fang, “Audio steganography based on iterative adversarial attacks against convolutional neural networks,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 2282–2294, 2020.
  6. Z.-L. Yang, X.-Q. Guo, Z.-M. Chen, Y.-F. Huang, and Y.-J. Zhang, “Rnn-stega: Linguistic steganography based on recurrent neural networks,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 5, pp. 1280–1295, 2018.
  7. Z.-L. Yang, S.-Y. Zhang, Y.-T. Hu, Z.-W. Hu, and Y.-F. Huang, “Vae-stega: linguistic steganography based on variational auto-encoder,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 880–895, 2020.
  8. S. Zhang, Z. Yang, J. Yang, and Y. Huang, “Provably secure generative linguistic steganography,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 3046–3055.
  9. C. S. de Witt, S. Sokota, J. Z. Kolter, J. N. Foerster, and M. Strohmeier, “Perfectly secure steganography using minimum entropy coupling,” in The Eleventh International Conference on Learning Representations, 2022.
  10. J. Ding, K. Chen, Y. Wang, N. Zhao, W. Zhang, and N. Yu, “Discop: Provably secure steganography in practice based on “distribution copies”,” in 2023 IEEE Symposium on Security and Privacy (SP).   IEEE Computer Society, 2023, pp. 2238–2255.
  11. T. Yang, H. Wu, B. Yi, G. Feng, and X. Zhang, “Semantic-preserving linguistic steganography by pivot translation and semantic-aware bins coding,” IEEE Transactions on Dependable and Secure Computing, 2023.
  12. T. Fang, M. Jaggi, and K. Argyraki, “Generating steganographic text with lstms,” in Proceedings of ACL 2017, Student Research Workshop, 2017, pp. 100–106.
  13. Z. M. Ziegler, Y. Deng, and A. M. Rush, “Neural linguistic steganography,” arXiv preprint arXiv:1909.01496, 2019.
  14. J. Shen, H. Ji, and J. Han, “Near-imperceptible neural linguistic steganography via self-adjusting arithmetic coding,” arXiv preprint arXiv:2010.00677, 2020.
  15. F. Z. Dai and Z. Cai, “Towards near-imperceptible steganographic text,” arXiv preprint arXiv:1907.06679, 2019.
  16. Z. Yang, S. Jin, Y. Huang, Y. Zhang, and H. Li, “Automatically generate steganographic text based on markov model and huffman coding,” arXiv preprint arXiv:1811.04720, 2018.
  17. E. Zielińska, W. Mazurczyk, and K. Szczypiorski, “Trends in steganography,” Communications of the ACM, vol. 57, no. 3, pp. 86–95, 2014.
  18. W. Mazurczyk and S. Wendzel, “Information hiding: challenges for forensic experts,” Communications of the ACM, vol. 61, no. 1, pp. 86–94, 2017.
  19. J. Kelley, “Terror groups hide behind web encryption.” USA Today (May 2001), 2001.
  20. D. Sieberg, “Bin laden exploits technology to suit his needs.” CNN (Sept. 2001), 2001.
  21. L. M. CAMERON, “With cryptography easier to detect, cybercriminals now hide malware in plain sight. call it steganography. here’s how it works.” IEEE Computer Society, 2018.
  22. Z. Chen, L. Huang, H. Miao, W. Yang, and P. Meng, “Steganalysis against substitution-based linguistic steganography based on context clusters,” Computers & Electrical Engineering, vol. 37, no. 6, pp. 1071–1081, 2011.
  23. L. Xiang, X. Sun, G. Luo, and B. Xia, “Linguistic steganalysis using the features derived from synonym frequency,” Multimedia tools and applications, vol. 71, pp. 1893–1911, 2014.
  24. Z. Yang, Y. Huang, and Y.-J. Zhang, “A fast and efficient text steganalysis method,” IEEE Signal Processing Letters, vol. 26, no. 4, pp. 627–631, 2019.
  25. Z. Yang, Y. Huang, and Y. Zhang, “Ts-csw: Text steganalysis and hidden capacity estimation based on convolutional sliding windows,” Multimedia Tools and Applications, vol. 79, pp. 18 293–18 316, 2020.
  26. Z. Yang, K. Wang, J. Li, Y. Huang, and Y.-J. Zhang, “Ts-rnn: text steganalysis based on recurrent neural networks,” IEEE Signal Processing Letters, vol. 26, no. 12, pp. 1743–1747, 2019.
  27. H. Wu, B. Yi, F. Ding, G. Feng, and X. Zhang, “Linguistic steganalysis with graph neural networks,” IEEE Signal Processing Letters, vol. 28, pp. 558–562, 2021.
  28. S. Li, J. Wang, and P. Liu, “Detection of generative linguistic steganography based on explicit and latent text word relation mining using deep learning,” IEEE Transactions on Dependable and Secure Computing, 2022.
  29. S. Guo, J. Liu, Z. Yang, W. You, and R. Zhang, “Linguistic steganalysis merging semantic and statistical features,” IEEE Signal Processing Letters, vol. 29, pp. 2128–2132, 2022.
  30. Y. Xue, L. Kong, W. Peng, P. Zhong, and J. Wen, “An effective linguistic steganalysis framework based on hierarchical mutual learning,” Information Sciences, vol. 586, pp. 140–154, 2022.
  31. J. Wen, L. Gao, G. Fan, Z. Zhang, J. Jia, and Y. Xue, “Scl-stega: Exploring advanced objective in linguistic steganalysis using contrastive learning,” in Proceedings of the 2023 ACM Workshop on Information Hiding and Multimedia Security, 2023, pp. 97–102.
  32. W. Peng, S. Li, Z. Qian, and X. Zhang, “Text steganalysis based on hierarchical supervised learning and dual attention mechanism,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023.
  33. Q. Xu, R. Zhang, and J. Liu, “Linguistic steganalysis by enhancing and integrating local and global features,” IEEE Signal Processing Letters, vol. 30, pp. 16–20, 2023.
  34. J. Zou, Z. Yang, S. Zhang, S. u. Rehman, and Y. Huang, “High-performance linguistic steganalysis, capacity estimation and steganographic positioning,” in International Workshop on Digital Watermarking.   Springer, 2020, pp. 80–93.
  35. J. Yang, Z. Yang, X. Ge, J. Zou, Y. Gao, and Y. Huang, “Link: Linguistic steganalysis framework with external knowledge,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2023, pp. 1–5.
  36. P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Comput. Surv., vol. 55, no. 9, jan 2023.
  37. Y. Chen, R. Wang, H. Jiang, S. Shi, and R. Xu, “Exploring the use of large language models for reference-free text quality evaluation: A preliminary empirical study,” arXiv preprint arXiv:2304.00723, 2023.
  38. M. B. et al., “Exploration of the effectiveness and characteristics of chatgpt in steganalysis tasks,” 2023.
  39. Y. Niu, J. Wen, P. Zhong, and Y. Xue, “A hybrid r-bilstm-c neural network based text steganalysis,” IEEE Signal Processing Letters, vol. 26, no. 12, pp. 1907–1911, 2019.
  40. H. Yang, Y. Bao, Z. Yang, S. Liu, Y. Huang, and S. Jiao, “Linguistic steganalysis via densely connected lstm with feature pyramid,” in Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security, 2020, pp. 5–10.
  41. J. Zhang, J. Shen, L. Wang, and H. Lin, “Coverless text information hiding method based on the word rank map,” in Cloud Computing and Security, X. Sun, A. Liu, H.-C. Chao, and E. Bertino, Eds.   Cham: Springer International Publishing, 2016, pp. 145–155.
  42. J. Zhang, H. Huang, L. Wang, H. Lin, and D. Gao, “Coverless text information hiding method using the frequent words hash.” Int. J. Netw. Secur., vol. 19, no. 6, pp. 1016–1023, 2017.
  43. R. Bergmair, “Towards linguistic steganography: A systematic investigation of approaches, systems, and issues,” Final year thesis, B. Sc.(Hons.) in Computer Studies, The University of Derby, 2004.
  44. M. J. Atallah, V. Raskin, M. Crogan, C. Hempelmann, F. Kerschbaum, D. Mohamed, and S. Naik, “Natural language watermarking: Design, analysis, and a proof-of-concept implementation,” in Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25–27, 2001 Proceedings 4.   Springer, 2001, pp. 185–200.
  45. T. Brown and e. a. Mann, “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33.   Curran Associates, Inc., 2020, pp. 1877–1901.
  46. A. Chowdhery and e. a. Narang, “Palm: Scaling language modeling with pathways,” arXiv preprint arXiv:2204.02311, 2022.
  47. B. Workshop, “Bloom: A 176b-parameter open-access multilingual language model,” 2023.
  48. H. Touvron and T. L. et al., “Llama: Open and efficient foundation language models,” 2023.
  49. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  50. R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto, “Stanford alpaca: An instruction-following llama model,” 2023.
  51. W. Chiang and Z. e. a. Li, “Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, mar. 2023.”
  52. OpenAI, “Gpt-4 technical report,” 2023.
  53. E. J. Hu and Y. S. et al., “Lora: Low-rank adaptation of large language models,” 2021.
  54. N. Muennighoff and T. W. et al., “Crosslingual generalization through multitask finetuning,” 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Minhao Bai. Jinshuai Yang (1 paper)
  2. Kaiyi Pang (8 papers)
  3. Huili Wang (14 papers)
  4. Yongfeng Huang (110 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com