LLM-Assisted Attacks

Updated 30 December 2025

LLM-assisted attacks are advanced cyber threats that leverage large language models to orchestrate multi-step exploitation, evasion, and manipulation operations.
They employ autonomous agents, dual-agent iterative reasoning, and adversarial prompt injection to bypass traditional security controls in diverse systems.
Empirical studies report high attack success rates and significant vulnerabilities, highlighting the need for robust, multi-layered defense strategies.

LLM-assisted attacks encompass a diverse and rapidly evolving landscape of techniques in which adversaries exploit the structure, reasoning, and integration of LLMs to advance offensive cyber operations well beyond the capabilities of traditional automation. These attacks leverage LLMs both as autonomous agents and as generative tools to manipulate, subvert, or evade security controls across software systems, ML workflows, protocol stacks, web agents, and information ecosystems. This entry synthesizes contemporary research on the subject, providing a panorama from autonomous exploitation orchestration and feature-level evasion, to poisoning, prompt injection, and hybrid human/AI workflows.

1. Foundational Classes of LLM-assisted Attacks

LLM-assisted attacks manifest in several canonical modalities:

Autonomous and Agentic Exploitation: LLMs act as multi-stage agents—conducting reconnaissance, vulnerability scanning, exploitation, post-exploitation lateral movement, and exfiltration. Modular frameworks integrate summarization, planning, experience retrieval, and command dispatch (e.g., AutoAttacker, (Xu et al., 2024)), achieving deterministic, high-throughput, hands-on-keyboard attack chains across realistic enterprise networks.
Feature-level Adversarial Attacks: Manipulating LLMs as black-box or collaborative agents to generate stealthy binary perturbations in static feature models (e.g., Drebin-style Android malware detection). Dual-agent designs can bypass high-accuracy detectors, leveraging retrieval-augmented generation (RAG) and iterative reasoning to achieve false-negative misclassification with high Attack Success Rate (ASR up to 97%)—see LAMLAD (Lan et al., 24 Dec 2025).
Tool-Calling and Pipeline Manipulation: Attacks subvert LLM-integrated tool-calling platforms, using adversarial tool descriptions to hijack retrieval and scheduling, exfiltrate user queries, trigger denial-of-service, and bias tool invocation (ToolCommander, (Wang et al., 2024)). Embedding optimized suffixes in JSON schemas achieves retrieval and manipulation conditions across multiple models.
Prompt Injection and Supply-chain Subversion: Malicious modifications to prompts (MaPP attacks, (Heibel et al., 2024)), externally retrieved code (HACKODE, (Zeng et al., 22 Apr 2025)), or hidden triggers in HTML accessibility trees (Johnson et al., 20 Jul 2025) induce vulnerabilities, incorrect behaviors, or credential exfiltration, even in sophisticated programming assistants and autonomous web agents.
Backdoor Attacks on Code Completion: LLM-guided payload transformation and obfuscation enable easy-to-trigger backdoor injection in code completion models, targeting both static analysis tools and LLM-based detectors (CodeBreaker, (Yan et al., 2024)), yielding high TPR for disguised vulnerabilities.
Automated Protocol Attack Discovery: Protocol-level vulnerabilities—such as DNSSEC cache-flushing DDoS—are generated via LLM chain-of-thought prompting, ReACT agent automation, and configuration. LAPRAD (Aygun et al., 22 Oct 2025) demonstrates the capacity to discover, construct, and validate new attacks overlooked by prior art.

2. Threat Models, Pipelines, and Attack Strategies

The technical underpinnings of LLM-assisted attacks transcend conventional scripting and tool automation. Key threat models and methodologies include:

Black-box and Gray-box Assumptions: Adversaries operate with incomplete system, retriever, or LLM knowledge, yet leverage white-box retrieval or partial tool registry access to inject optimized triggers, adversarial contexts, or payloads (Wang et al., 2024, Zeng et al., 22 Apr 2025).
Dual-Agent/Iterative Reasoning: Attack frameworks coordinate multiple LLM roles—Manipulator and Analyzer (Lan et al., 24 Dec 2025)—to iteratively add features, interpret feedback, and converge on evasion-examples efficiently, often via RAG for contextual factuality.
Jailbreaking and Pretext Engineering: Structured prompt composition (RSA: Role-assignment, Scenario-pretexting, Action-solicitation) manipulates public LLMs to bypass safety filters and generate exploit code directly from CVEs (Diouf et al., 28 Dec 2025). Prompt framing and "idea" descriptors maximize cooperation probability $(\alpha_{\text{coop}} \rightarrow 1)$ .
Adversarial Pipeline Construction: Malicious actors exploit input chains (e.g., external code retrieval, prompt composition, tool-calling) to subvert output via token-optimized comment strings, payloads, or instruction biases (Zeng et al., 22 Apr 2025, Heibel et al., 2024).
Gradient-based Trigger Optimization: Algorithms such as Greedy Coordinate Gradient (GCG, (Johnson et al., 20 Jul 2025)) employ forward-difference log-prob gradients over embedding spaces to identify universal adversarial triggers against LLM agents parsing accessibility tree data.

Attack Class	Example Pipeline/Agent	Main Technical Strategy
Feature Evasion	LAMLAD (Manipulator-Analyzer) (Lan et al., 24 Dec 2025)	Iterative, RAG-grounded feature addition
Tool-pipeline	ToolCommander (Wang et al., 2024)	MCG suffix optimization, scheduler poisoning
Prompt Injection	MaPP (Heibel et al., 2024), HACKODE (Zeng et al., 22 Apr 2025)	Natural-language payloads in prompt/code
Backdoor	CodeBreaker (Yan et al., 2024)	LLM-guided obfuscation/AST mutation
Agentic Exploit	AutoAttacker (Xu et al., 2024)	Summarizer, Planner, Experience, Navigation

3. Quantitative Metrics, Experimental Results, and Case Studies

Empirical evaluations consistently employ Attack Success Rate (ASR), True/False Positive Rates (TPR/FPR), and domain-specific metrics (lines of code, commands per interaction, exploitation yield):

AutoAttacker: 100% SR (success rate) across 14 real-world post-breach scenarios, including privilege escalation, ransomware, and lateral movement, with mean rounds per task $\overline{IN}\sim5-17$ (Xu et al., 2024).
LAMLAD: Gemini–Gemini agent pair yields ASR $\sim97\%$ for all ML malware detectors, averaging 3 manipulation attempts; adversarial training reduces ASR by $>30\%$ (Lan et al., 24 Dec 2025).
ToolCommander: Stage 1 privacy extraction yields ASR_PT up to $91.7\%$ (contriever retriever); DoS and unscheduled tool-calling achieve $100\%$ ASR in certain cases (Wang et al., 2024).
MaPP Attack: All major LLMs (Claude 3 Opus, GPT-4 Omni) achieve $>95\%$ adversarial insertion rates for general vulnerabilities under short payloads ( $<500$ bytes), with minimal functional degradation (Heibel et al., 2024).
CodeBreaker: Up to $90\%$ pass rate versus GPT-4/Llama-3 detectors for transformed payloads; user study found $9/10$ participants accepted at least one malicious payload (Yan et al., 2024).
HACKODE: Overall mean ASR $84.29\%$ across four open-source code LLMs for buffer overflows, infinite loops, validation errors; real-world deployment yields ASR $75.92\%$ (Zeng et al., 22 Apr 2025).

Framework	ASR (Best)	Context	Notes
AutoAttacker	100%	Post-breach net	T=0, 14 scenarios
LAMLAD	97%	Android malware	Gemini–Gemini, Drebin feats
MaPP	$\geq$ 95%	Code Assistants	7 LLMs, HumanEval/CWE
ToolCommander	100%	Tool pipelines	GPT/ToolBench, DoS/UTC

4. Representative Application Domains

LLM-assisted attacks extend to the following sectors and workflows:

Software Exploitation and Penetration Testing: End-to-end exploit generation from CVEs, privilege escalation, web application compromise; nullifying expertise boundaries (Diouf et al., 28 Dec 2025, Xu et al., 2024).
Malware Evasion in ML Security Workflows: Feature-level manipulation of detection models in mobile malware analysis (Lan et al., 24 Dec 2025), acoustic side-channel attacks leveraging LLM error correction (Ayati et al., 15 Apr 2025).
Information Ecosystem Manipulation: Jailbreaks for misinformation in health domains, exploiting model role-play, alternate realities, and expert simulation techniques (Hussain et al., 6 Aug 2025).
Autonomous Web Agents and RPA: Indirect prompt injection via accessibility tree serialization in browser automation—credential theft and unauthorized actions (Johnson et al., 20 Jul 2025).
Scientific Peer Review Manipulation: Adversarial prompt injection via invisible text in PDF submissions to bias LLM-based reviews (Collu et al., 28 Aug 2025).

5. Security Implications, Vulnerabilities, and Countermeasures

The core implications are:

Skill-Barrier Collapse: Pretext engineering and robust prompt manipulation allow non-experts to weaponize vulnerabilities (Diouf et al., 28 Dec 2025, Heibel et al., 2024).
Safety Fine-tuning Failures: Scaling models and RLHF do not prevent instruction-following for malicious prompts (Heibel et al., 2024, Collu et al., 28 Aug 2025).
Supply-chain & Input Poisons: External information provenance (forums, APIs, documents) becomes attack vectors for subverting LLM outcomes (Zeng et al., 22 Apr 2025, Johnson et al., 20 Jul 2025).
Evasion of Automated Defenses: Adversarial sequence design enables robust bypass of static analysis, activation-clustering, and LLM-based scanning (Yan et al., 2024, Zeng et al., 22 Apr 2025).

Defensive strategies include:

Registry & Input Validation: Strict schema checks, prompt sanitization, integrity auditing, instruction hierarchy enforcement (Wang et al., 2024, Heibel et al., 2024).
Scheduler Hardening: Combined similarity-task alignment, anomaly detection on tool description embeddings (Wang et al., 2024).
Adversarial Training and Data Augmentation: Injecting adversarial samples or triggers into training sets improves robustness (ASR reduction $>$ 30%; (Lan et al., 24 Dec 2025, Yan et al., 2024)).
Red-team Simulation and Monitoring: Preemptive enterprise defense via in-house LLM red-teaming, monitoring of dialog patterns and “fix my exploit” loops (Diouf et al., 28 Dec 2025).
Architectural Redesigns: Output auditing with static analyzers, agent pipeline formalization, OCR-based ingestion of documents, provenance enforcement (Collu et al., 28 Aug 2025, Yan et al., 2024).

6. Open Problems and Research Directions

Persistent open challenges are:

Generalizing Defense Metrics: Robust detection methods that generalize across unknown triggers, code semantics, and prompt structures are missing (Yan et al., 2024).
Balancing Usability and Robustness: Input sanitization and unpredictably filtered HTML or code can degrade legitimate model outputs; architectural trade-offs remain unsolved (Johnson et al., 20 Jul 2025).
Benchmarking and Standardization Approaches: There are no standard benchmarks for backdoor robustness in code LLMs, nor for prompt-injection risk in multi-agent pipelines (Yan et al., 2024, Heibel et al., 2024).
Transferring Defenses to Information Ecosystems: Health misinformation jailbreaks, peer review manipulation, and acoustic signal recovery highlight broader societal impacts requiring interdisciplinary mitigation (Hussain et al., 6 Aug 2025, Collu et al., 28 Aug 2025, Ayati et al., 15 Apr 2025).

7. Summary Table of Key LLM-assisted Attack Frameworks

Framework	Attack Type	Domain	ASR/TPR (best)	Defense Methods
AutoAttacker	Modular agentic exploit	Post-breach networks	100% (T=0)	C2 monitoring, adversarial training
LAMLAD	Feature-level evasion	Android malware detection	up to 97%	Adversarial training (ASR –30%)
MaPP	Prompt injection/code vuln	Code assistants	$\geq$ 95%	Prompt sanitization, output audit
ToolCommander	Tool registry perturbation	LLM-powered automation	up to 100%	Registry validation, scheduler hardening
CodeBreaker	LLM-assisted backdoors	Code completion	up to 90%	Influence filtering, adversarial fine-tuning

LLM-assisted attacks represent a paradigm shift in adversarial methodology, challenging foundational assumptions around expertise, automation barriers, and defense-in-depth. Mitigation requires layered, context-aware technical interventions and a new generation of model and pipeline-centric security paradigms adapted to LLM-driven environments.

Markdown Upgrade to Chat

References (12)

AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks (2024)

LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors (2025)

From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection (2024)

MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants (2024)

Inducing Vulnerable Code Generation in LLM Coding Assistants (2025)

Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree (2025)

An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection (2024)

LAPRAD: LLM-Assisted PRotocol Attack Discovery (2025)

From Rookie to Expert: Manipulating LLMs for Automated Vulnerability Exploitation in Enterprise Software (2025)

10.

Making Acoustic Side-Channel Attacks on Noisy Keyboards Viable with LLM-Assisted Spectrograms' "Typo" Correction (2025)

11.

An Audit and Analysis of LLM-Assisted Health Misinformation Jailbreaks Against LLMs (2025)

12.

Publish to Perish: Prompt Injection Attacks on LLM-Assisted Peer Review (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LLM-assisted Attacks.

LLM-Assisted Attacks

1. Foundational Classes of LLM-assisted Attacks

2. Threat Models, Pipelines, and Attack Strategies

3. Quantitative Metrics, Experimental Results, and Case Studies

4. Representative Application Domains

5. Security Implications, Vulnerabilities, and Countermeasures

6. Open Problems and Research Directions

7. Summary Table of Key LLM-assisted Attack Frameworks

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

LLM-Assisted Attacks

1. Foundational Classes of LLM-assisted Attacks

2. Threat Models, Pipelines, and Attack Strategies

3. Quantitative Metrics, Experimental Results, and Case Studies

4. Representative Application Domains

5. Security Implications, Vulnerabilities, and Countermeasures

6. Open Problems and Research Directions

7. Summary Table of Key LLM-assisted Attack Frameworks

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research