Malla: Demystifying Real-world Large Language Model Integrated Malicious Services (2401.03315v3)
Abstract: The underground exploitation of LLMs for malicious services (i.e., Malla) is witnessing an uptick, amplifying the cyber threat landscape and posing questions about the trustworthiness of LLM technologies. However, there has been little effort to understand this new cybercrime, in terms of its magnitude, impact, and techniques. In this paper, we conduct the first systematic study on 212 real-world Mallas, uncovering their proliferation in underground marketplaces and exposing their operational modalities. Our study discloses the Malla ecosystem, revealing its significant growth and impact on today's public LLM services. Through examining 212 Mallas, we uncovered eight backend LLMs used by Mallas, along with 182 prompts that circumvent the protective measures of public LLM APIs. We further demystify the tactics employed by Mallas, including the abuse of uncensored LLMs and the exploitation of public LLM APIs through jailbreak prompts. Our findings enable a better understanding of the real-world exploitation of LLMs by cybercriminals, offering insights into strategies to counteract this cybercrime.
- ast — abstract syntax trees — python documentation. https://docs.python.org/3/library/ast.html.
- Blackhatgpt. https://blackhatgpt.netlify.app/.
- Blockchain explorer - bitcoin tracker & more | blockchain.com. https://www.blockchain.com/explorer.
- Btcpay server. https://btcpayserver.org/.
- Charybdis worm | many infections | spread widely | discord | telegram | lan |and more. https://hackforums.net/showthread.php?tid=6229200.
- Clang c language family frontend for llvm. https://clang.llvm.org/.
- codeop — compile python code — python documentation. https://docs.python.org/3/library/codeop.html.
- Digital selling with ease | sellix. https://sellix.io/.
- Eleutherai/gpt-j-6b · hugging face. https://huggingface.co/EleutherAI/gpt-j-6b.
- Ethereum (eth) blockchain explorer. https://etherscan.io/.
- Evil confidant | jailbreakchat. https://www.jailbreakchat.com/prompt/588ab0ed-2829-4be8-a3f3-f28e29c06621.
- Fraudgpt. https://mainnet.demo.btcpayserver.org/apps/2tUy9NHwGq2zWSrg4asqs8be19q1/pos.
- Freedomgpt. https://www.freedomgpt.com/.
- Freedomgpt/renderer/localmodels/offlinemodels.ts at 6005afc075f82f227c177c95c177ede3c44dbff0 · ohmplatform/freedomgpt. https://github.com/ohmplatform/FreedomGPT/blob/6005afc075f82f227c177c95c177ede3c44dbff0/renderer/localModels/offlineModels.ts.
- Gpt-3 - wikipedia. https://en.wikipedia.org/wiki/GPT-3.
- Gunning fog index - wikipedia. https://en.wikipedia.org/wiki/Gunning_fog_index.
- Is fraud-gpt any good. https://hackforums.net/showthread.php?tid=6253036.
- jaro-winkler | levenshtein 0.23.0 documentation. https://maxbachmann.github.io/Levenshtein/levenshtein.html#jaro-winkler.
- michellejieli/nsfw_text_classifier · hugging face. https://huggingface.co/michellejieli/NSFW_text_classifier.
- Netlify: Develop and deploy websites and apps in record time. https://netlify.app/.
- Oopspam anti-spam api: A powerful spam filter for any content exchange. https://www.oopspam.com/.
- openai/tiktoken: tiktoken is a fast bpe tokeniser for use with openai’s models. https://github.com/openai/tiktoken.
- Owasp webgoat | owasp foundation. https://owasp.org/www-project-webgoat/.
- Pygmalion 13b - wnr.ai. https://wnr.ai/models/pygmalion-13b.
- Pygmalionai. https://pygmalion.chat/.
- Pygmalionai/pygmalion-2-13b · hugging face. https://huggingface.co/PygmalionAI/pygmalion-2-13b.
- Selenium. https://www.selenium.dev.
- Semantic textual similarity | sentence-transformers documentation. https://www.sbert.net/docs/usage/semantic_textual_similarity.html.
- Tap-m/luna-ai-llama2-uncensored · hugging face. https://huggingface.co/Tap-M/Luna-AI-Llama2-Uncensored.
- Tap mobile. https://tap.pm/.
- Thebloke/luna-ai-llama2-uncensored-gguf · hugging face. https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GGUF.
- Thebloke/pygmalion-2-13b-gptq · hugging face. https://huggingface.co/TheBloke/Pygmalion-2-13B-GPTQ.
- Tor project | anonymity online. https://www.torproject.org.
- Vicidial.com. https://www.vicidial.com/.
- Virustotal - home. https://www.virustotal.com/.
- The w3c markup validation service. https://validator.w3.org/.
- Wormgpt and fraudgpt – the rise of malicious llms. https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/wormgpt-and-fraudgpt-the-rise-of-malicious-llms/.
- Wormgpt service has been shut down by its developer. https://hackforums.net/showthread.php?tid=6249700.
- Wormgpt service has been shut down by its developer. https://hackforums.net/showthread.php?tid=624970/.
- Wormgpt v3 is here! https://mainnet.demo.btcpayserver.org/apps/CQqsBk7TGXveh9R9K3Z5KTDqVvE/pos.
- Wormgpt – the generative ai tool cybercriminals are using to launch business email compromise attacks. https://slashnext.com/blog/wormgpt-the-generative-ai-tool-cybercriminals-are-using-to-launch-business-email-compromise-attacks/.
- code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3(POPL):1–29, 2019.
- Under the shadow of sunshine: Understanding and detecting bulletproof hosting on legitimate service provider networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 805–823. IEEE, 2017.
- The menlo report. IEEE Security & Privacy, 10(2):71–75, 2012.
- Authorship attribution of source code: A language-agnostic approach and applicability in software engineering. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 932–944, 2021.
- Evaluating the susceptibility of pre-trained language models via handcrafted adversarial examples. arXiv preprint arXiv:2209.02128, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Truth, lies, and automation: How language models could change disinformation. Center for Security and Emerging Technology, 2021.
- Andarwin: Scalable detection of semantically similar android applications. In Computer Security–ESORICS 2013: 18th European Symposium on Research in Computer Security, Egham, UK, September 9-13, 2013. Proceedings 18, pages 182–199. Springer, 2013.
- Tinystories: How small can language models be and still speak coherent english? arXiv preprint arXiv:2305.07759, 2023.
- The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
- More than you’ve asked for: A comprehensive analysis of novel prompt injection threats to application-integrated large language models. arXiv preprint arXiv:2302.12173, 2023.
- From chatgpt to threatgpt: Impact of generative ai in cybersecurity and privacy. IEEE Access, 2023.
- A survey of binary code similarity. ACM Computing Surveys (CSUR), 54(3):1–38, 2021.
- Julian Hazell. Large language models can be used to effectively scale spear phishing campaigns. arXiv preprint arXiv:2305.06972, 2023.
- Tracking ransomware end-to-end. In 2018 IEEE Symposium on Security and Privacy (SP), pages 618–631. IEEE, 2018.
- Ethical frameworks and computer security trolley problems: Foundations for conversations. In 32nd USENIX Security Symposium (USENIX Security 23), pages 5145–5162, Anaheim, CA, 2023. USENIX Association.
- All the news that’s fit to fabricate: Ai-generated text as a tool of media misinformation. Journal of experimental political science, 9(1):104–117, 2022.
- Click trajectories: End-to-end analysis of the spam value chain. In 2011 ieee symposium on security and privacy, pages 431–446. IEEE, 2011.
- Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023.
- Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv preprint arXiv:2305.13860, 2023.
- {{\{{PharmaLeaks}}\}}: Understanding the business of online pharmaceutical affiliate programs. In 21st USENIX Security Symposium (USENIX Security 12), pages 1–16, 2012.
- Meta. Llama use policy. https://ai.meta.com/llama/use-policy/.
- {{\{{DeepPhish}}\}}: Understanding user trust towards artificially generated profiles in online social networks. In 31st USENIX Security Symposium (USENIX Security 22), pages 1669–1686, 2022.
- Sting operations. US Department of Justice, Office of Community Oriented Policing Services, 2007.
- OpenAI. Api. https://platform.openai.com/docs/api-reference/introduction.
- openai. Gpt-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf.
- OpenAI. Moderation. https://platform.openai.com/docs/guides/moderation.
- OpenAI. Usage policies. https://openai.com/policies/usage-policies.
- openchatkit. Moderation. https://openchatkit.net.
- Will Oremus. The clever trick that turns chatgpt into its evil twin. Washington Post, 2023.
- Training language models to follow instructions with human feedback, 2022. URL https://arxiv. org/abs/2203.02155, 13, 2022.
- Ignore previous prompt: Attack techniques for language models. arXiv preprint arXiv:2211.09527, 2022.
- Poe. Poe usage guidelines. https://poe.com/usage_guidelines.
- Latent jailbreak: A benchmark for evaluating text safety and output robustness of large language models. arXiv preprint arXiv:2307.08487, 2023.
- Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models. arXiv preprint arXiv:2305.13873, 2023.
- Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, 2019.
- “do anything now”: Characterizing and evaluating in-the-wild jailbreak prompts on large language models. arXiv preprint arXiv:2308.03825, 2023.
- Authorship attribution for neural text generation. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 8384–8395, 2020.
- Into the deep web: Understanding e-commercefraud from autonomous chat with cybercriminals. In Proceedings of the ISOC Network and Distributed System Security Symposium (NDSS), 2020, 2020.
- WordStream. Free keyword tool | wordstream. https://www.wordstream.com/keywords?camplink=mainnavbar&campname=KWT&cid=Web_Any_MegaMenu_Keywords_KWTool_KWTool.
- Synthetic lies: Understanding ai-generated misinformation and evaluating algorithmic and human solutions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pages 1–20, 2023.
- Zilong Lin (3 papers)
- Jian Cui (62 papers)
- Xiaojing Liao (9 papers)
- Xiaofeng Wang (310 papers)