Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Malla: Demystifying Real-world Large Language Model Integrated Malicious Services (2401.03315v3)

Published 6 Jan 2024 in cs.CR and cs.AI

Abstract: The underground exploitation of LLMs for malicious services (i.e., Malla) is witnessing an uptick, amplifying the cyber threat landscape and posing questions about the trustworthiness of LLM technologies. However, there has been little effort to understand this new cybercrime, in terms of its magnitude, impact, and techniques. In this paper, we conduct the first systematic study on 212 real-world Mallas, uncovering their proliferation in underground marketplaces and exposing their operational modalities. Our study discloses the Malla ecosystem, revealing its significant growth and impact on today's public LLM services. Through examining 212 Mallas, we uncovered eight backend LLMs used by Mallas, along with 182 prompts that circumvent the protective measures of public LLM APIs. We further demystify the tactics employed by Mallas, including the abuse of uncensored LLMs and the exploitation of public LLM APIs through jailbreak prompts. Our findings enable a better understanding of the real-world exploitation of LLMs by cybercriminals, offering insights into strategies to counteract this cybercrime.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. ast — abstract syntax trees — python documentation. https://docs.python.org/3/library/ast.html.
  2. Blackhatgpt. https://blackhatgpt.netlify.app/.
  3. Blockchain explorer - bitcoin tracker & more | blockchain.com. https://www.blockchain.com/explorer.
  4. Btcpay server. https://btcpayserver.org/.
  5. Charybdis worm | many infections | spread widely | discord | telegram | lan |and more. https://hackforums.net/showthread.php?tid=6229200.
  6. Clang c language family frontend for llvm. https://clang.llvm.org/.
  7. codeop — compile python code — python documentation. https://docs.python.org/3/library/codeop.html.
  8. Digital selling with ease | sellix. https://sellix.io/.
  9. Eleutherai/gpt-j-6b · hugging face. https://huggingface.co/EleutherAI/gpt-j-6b.
  10. Ethereum (eth) blockchain explorer. https://etherscan.io/.
  11. Evil confidant | jailbreakchat. https://www.jailbreakchat.com/prompt/588ab0ed-2829-4be8-a3f3-f28e29c06621.
  12. Fraudgpt. https://mainnet.demo.btcpayserver.org/apps/2tUy9NHwGq2zWSrg4asqs8be19q1/pos.
  13. Freedomgpt. https://www.freedomgpt.com/.
  14. Freedomgpt/renderer/localmodels/offlinemodels.ts at 6005afc075f82f227c177c95c177ede3c44dbff0 · ohmplatform/freedomgpt. https://github.com/ohmplatform/FreedomGPT/blob/6005afc075f82f227c177c95c177ede3c44dbff0/renderer/localModels/offlineModels.ts.
  15. Gpt-3 - wikipedia. https://en.wikipedia.org/wiki/GPT-3.
  16. Gunning fog index - wikipedia. https://en.wikipedia.org/wiki/Gunning_fog_index.
  17. Is fraud-gpt any good. https://hackforums.net/showthread.php?tid=6253036.
  18. jaro-winkler | levenshtein 0.23.0 documentation. https://maxbachmann.github.io/Levenshtein/levenshtein.html#jaro-winkler.
  19. michellejieli/nsfw_text_classifier · hugging face. https://huggingface.co/michellejieli/NSFW_text_classifier.
  20. Netlify: Develop and deploy websites and apps in record time. https://netlify.app/.
  21. Oopspam anti-spam api: A powerful spam filter for any content exchange. https://www.oopspam.com/.
  22. openai/tiktoken: tiktoken is a fast bpe tokeniser for use with openai’s models. https://github.com/openai/tiktoken.
  23. Owasp webgoat | owasp foundation. https://owasp.org/www-project-webgoat/.
  24. Pygmalion 13b - wnr.ai. https://wnr.ai/models/pygmalion-13b.
  25. Pygmalionai. https://pygmalion.chat/.
  26. Pygmalionai/pygmalion-2-13b · hugging face. https://huggingface.co/PygmalionAI/pygmalion-2-13b.
  27. Selenium. https://www.selenium.dev.
  28. Semantic textual similarity | sentence-transformers documentation. https://www.sbert.net/docs/usage/semantic_textual_similarity.html.
  29. Tap-m/luna-ai-llama2-uncensored · hugging face. https://huggingface.co/Tap-M/Luna-AI-Llama2-Uncensored.
  30. Tap mobile. https://tap.pm/.
  31. Thebloke/luna-ai-llama2-uncensored-gguf · hugging face. https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGUF.
  32. Thebloke/pygmalion-2-13b-gptq · hugging face. https://huggingface.co/TheBloke/Pygmalion-2-13B-GPTQ.
  33. Tor project | anonymity online. https://www.torproject.org.
  34. Vicidial.com. https://www.vicidial.com/.
  35. Virustotal - home. https://www.virustotal.com/.
  36. The w3c markup validation service. https://validator.w3.org/.
  37. Wormgpt and fraudgpt – the rise of malicious llms. https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/wormgpt-and-fraudgpt-the-rise-of-malicious-llms/.
  38. Wormgpt service has been shut down by its developer. https://hackforums.net/showthread.php?tid=6249700.
  39. Wormgpt service has been shut down by its developer. https://hackforums.net/showthread.php?tid=624970/.
  40. Wormgpt v3 is here! https://mainnet.demo.btcpayserver.org/apps/CQqsBk7TGXveh9R9K3Z5KTDqVvE/pos.
  41. Wormgpt – the generative ai tool cybercriminals are using to launch business email compromise attacks. https://slashnext.com/blog/wormgpt-the-generative-ai-tool-cybercriminals-are-using-to-launch-business-email-compromise-attacks/.
  42. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3(POPL):1–29, 2019.
  43. Under the shadow of sunshine: Understanding and detecting bulletproof hosting on legitimate service provider networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 805–823. IEEE, 2017.
  44. The menlo report. IEEE Security & Privacy, 10(2):71–75, 2012.
  45. Authorship attribution of source code: A language-agnostic approach and applicability in software engineering. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 932–944, 2021.
  46. Evaluating the susceptibility of pre-trained language models via handcrafted adversarial examples. arXiv preprint arXiv:2209.02128, 2022.
  47. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  48. Truth, lies, and automation: How language models could change disinformation. Center for Security and Emerging Technology, 2021.
  49. Andarwin: Scalable detection of semantically similar android applications. In Computer Security–ESORICS 2013: 18th European Symposium on Research in Computer Security, Egham, UK, September 9-13, 2013. Proceedings 18, pages 182–199. Springer, 2013.
  50. Tinystories: How small can language models be and still speak coherent english? arXiv preprint arXiv:2305.07759, 2023.
  51. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
  52. More than you’ve asked for: A comprehensive analysis of novel prompt injection threats to application-integrated large language models. arXiv preprint arXiv:2302.12173, 2023.
  53. From chatgpt to threatgpt: Impact of generative ai in cybersecurity and privacy. IEEE Access, 2023.
  54. A survey of binary code similarity. ACM Computing Surveys (CSUR), 54(3):1–38, 2021.
  55. Julian Hazell. Large language models can be used to effectively scale spear phishing campaigns. arXiv preprint arXiv:2305.06972, 2023.
  56. Tracking ransomware end-to-end. In 2018 IEEE Symposium on Security and Privacy (SP), pages 618–631. IEEE, 2018.
  57. Ethical frameworks and computer security trolley problems: Foundations for conversations. In 32nd USENIX Security Symposium (USENIX Security 23), pages 5145–5162, Anaheim, CA, 2023. USENIX Association.
  58. All the news that’s fit to fabricate: Ai-generated text as a tool of media misinformation. Journal of experimental political science, 9(1):104–117, 2022.
  59. Click trajectories: End-to-end analysis of the spam value chain. In 2011 ieee symposium on security and privacy, pages 431–446. IEEE, 2011.
  60. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
  61. Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv preprint arXiv:2305.13860, 2023.
  62. {{\{{PharmaLeaks}}\}}: Understanding the business of online pharmaceutical affiliate programs. In 21st USENIX Security Symposium (USENIX Security 12), pages 1–16, 2012.
  63. Meta. Llama use policy. https://ai.meta.com/llama/use-policy/.
  64. {{\{{DeepPhish}}\}}: Understanding user trust towards artificially generated profiles in online social networks. In 31st USENIX Security Symposium (USENIX Security 22), pages 1669–1686, 2022.
  65. Sting operations. US Department of Justice, Office of Community Oriented Policing Services, 2007.
  66. OpenAI. Api. https://platform.openai.com/docs/api-reference/introduction.
  67. openai. Gpt-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf.
  68. OpenAI. Moderation. https://platform.openai.com/docs/guides/moderation.
  69. OpenAI. Usage policies. https://openai.com/policies/usage-policies.
  70. openchatkit. Moderation. https://openchatkit.net.
  71. Will Oremus. The clever trick that turns chatgpt into its evil twin. Washington Post, 2023.
  72. Training language models to follow instructions with human feedback, 2022. URL https://arxiv. org/abs/2203.02155, 13, 2022.
  73. Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527, 2022.
  74. Poe. Poe usage guidelines. https://poe.com/usage_guidelines.
  75. Latent jailbreak: A benchmark for evaluating text safety and output robustness of large language models. arXiv preprint arXiv:2307.08487, 2023.
  76. Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models. arXiv preprint arXiv:2305.13873, 2023.
  77. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, 2019.
  78. “do anything now”: Characterizing and evaluating in-the-wild jailbreak prompts on large language models. arXiv preprint arXiv:2308.03825, 2023.
  79. Authorship attribution for neural text generation. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 8384–8395, 2020.
  80. Into the deep web: Understanding e-commercefraud from autonomous chat with cybercriminals. In Proceedings of the ISOC Network and Distributed System Security Symposium (NDSS), 2020, 2020.
  81. WordStream. Free keyword tool | wordstream. https://www.wordstream.com/keywords?camplink=mainnavbar&campname=KWT&cid=Web_Any_MegaMenu_Keywords_KWTool_KWTool.
  82. Synthetic lies: Understanding ai-generated misinformation and evaluating algorithmic and human solutions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–20, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zilong Lin (3 papers)
  2. Jian Cui (62 papers)
  3. Xiaojing Liao (9 papers)
  4. Xiaofeng Wang (310 papers)
Citations (15)

Summary

  • The paper highlights how cybercriminals repurpose LLMs into 'Malla' services, analyzing 14 services and 198 projects in underground markets.
  • It details methods like uncensored LLMs and engineered jailbreak prompts that bypass API safety measures to generate high-quality malware.
  • The study reveals significant financial incentives, with some cases earning over $28K in a short time, and calls for robust cybersecurity countermeasures.

Analysis of LLM Exploitation in Malicious Cyber Activities

The research paper entitled "Malla: Demystifying Real-world LLM Integrated Malicious Services" offers an insightful examination of the misuse of LLMs in cybercriminal activities, introducing the concept of "Malla" — malicious LLM applications. By analyzing 14 Malla services and 198 Malla projects discovered across various platforms, the authors present a structured overview of how these LLMs are repurposed for generating malicious content, thus expanding the cyber threat landscape.

The researchers embarked on the first systematic paper of real-world Mallas, exploring their prevalence and impact on underground marketplaces and the broader implications for public LLM services. This paper unveils the operational modalities of Mallas, which are fueled by an increasing demand and facilitated by the misuse of state-of-the-art models like OpenAI's GPT and others. The researchers identified various artifacts from the 212 Malla samples, including eight backend LLMs and 182 prompts designed to bypass public LLM APIs' protective measures. The paper highlighted several backend LLMs and captive platforms such as Poe and FlowGPT that host Malla projects, thus providing unmatched insights into the ecosystem of malicious LLM exploitation.

The numerical results presented reveal the effective deployment of Mallas in underground settings, which consist of malicious services ranging from malware generation to phishing email and website creation. Concretely, the paper shows that Malla services like DarkGPT and EscapeGPT are notably proficient in producing high-quality, compilable malware capable of evading VirusTotal detection. The analysis also underscores the significant financial allure of Mallas: for instance, the authors' case paper reveals a notable revenue figure exceeding $28,000 within just three months for a specific Malla service.

Diving deeper into the tactics employed, the paper outlines two primary methods: the exploitation of uncensored LLMs and the use of jailbreak prompts against public LLM APIs to circumvent their security measures. Uncensored LLMs are particularly dangerous as they generate potentially harmful content without filtration. Jailbreak prompts, on the other hand, act as encoded instructions that bypass the protective layers of LLM API services. The identified prompts serve as a wake-up call for the need for enhanced and adaptive security strategies.

Addressing wider implications, the paper suggests that the continued proliferation of Mallas can lead to further risks. Such misuse of LLMs not only magnifies existing cybersecurity challenges but raises substantial concerns regarding the trustworthiness of LLM technologies. By putting forth a clear presentation of how these threats manifest and their operational characteristics, the paper effectively calls for the development of robust counter-strategies against these emerging threats.

In conclusion, this paper provides a comprehensive lens through which we can view the weaponization of LLMs in the cybercrime domain. It not only broadens our understanding of current cyber threats but also acts as a guide for the development of countermeasures that can adapt to the rapidly evolving threat landscape. Future developments in AI and cybersecurity must consider these findings to build systems resilient against the creative strategies of adversaries exploiting the very technology designed to propel us forward.

Youtube Logo Streamline Icon: https://streamlinehq.com