Papers
Topics
Authors
Recent
2000 character limit reached

LLM-Assisted Attacks

Updated 30 December 2025
  • LLM-assisted attacks are advanced cyber threats that leverage large language models to orchestrate multi-step exploitation, evasion, and manipulation operations.
  • They employ autonomous agents, dual-agent iterative reasoning, and adversarial prompt injection to bypass traditional security controls in diverse systems.
  • Empirical studies report high attack success rates and significant vulnerabilities, highlighting the need for robust, multi-layered defense strategies.

LLM-assisted attacks encompass a diverse and rapidly evolving landscape of techniques in which adversaries exploit the structure, reasoning, and integration of LLMs to advance offensive cyber operations well beyond the capabilities of traditional automation. These attacks leverage LLMs both as autonomous agents and as generative tools to manipulate, subvert, or evade security controls across software systems, ML workflows, protocol stacks, web agents, and information ecosystems. This entry synthesizes contemporary research on the subject, providing a panorama from autonomous exploitation orchestration and feature-level evasion, to poisoning, prompt injection, and hybrid human/AI workflows.

1. Foundational Classes of LLM-assisted Attacks

LLM-assisted attacks manifest in several canonical modalities:

  • Autonomous and Agentic Exploitation: LLMs act as multi-stage agents—conducting reconnaissance, vulnerability scanning, exploitation, post-exploitation lateral movement, and exfiltration. Modular frameworks integrate summarization, planning, experience retrieval, and command dispatch (e.g., AutoAttacker, (Xu et al., 2 Mar 2024)), achieving deterministic, high-throughput, hands-on-keyboard attack chains across realistic enterprise networks.
  • Feature-level Adversarial Attacks: Manipulating LLMs as black-box or collaborative agents to generate stealthy binary perturbations in static feature models (e.g., Drebin-style Android malware detection). Dual-agent designs can bypass high-accuracy detectors, leveraging retrieval-augmented generation (RAG) and iterative reasoning to achieve false-negative misclassification with high Attack Success Rate (ASR up to 97%)—see LAMLAD (Lan et al., 24 Dec 2025).
  • Tool-Calling and Pipeline Manipulation: Attacks subvert LLM-integrated tool-calling platforms, using adversarial tool descriptions to hijack retrieval and scheduling, exfiltrate user queries, trigger denial-of-service, and bias tool invocation (ToolCommander, (Wang et al., 13 Dec 2024)). Embedding optimized suffixes in JSON schemas achieves retrieval and manipulation conditions across multiple models.
  • Prompt Injection and Supply-chain Subversion: Malicious modifications to prompts (MaPP attacks, (Heibel et al., 12 Jul 2024)), externally retrieved code (HACKODE, (Zeng et al., 22 Apr 2025)), or hidden triggers in HTML accessibility trees (Johnson et al., 20 Jul 2025) induce vulnerabilities, incorrect behaviors, or credential exfiltration, even in sophisticated programming assistants and autonomous web agents.
  • Backdoor Attacks on Code Completion: LLM-guided payload transformation and obfuscation enable easy-to-trigger backdoor injection in code completion models, targeting both static analysis tools and LLM-based detectors (CodeBreaker, (Yan et al., 10 Jun 2024)), yielding high TPR for disguised vulnerabilities.
  • Automated Protocol Attack Discovery: Protocol-level vulnerabilities—such as DNSSEC cache-flushing DDoS—are generated via LLM chain-of-thought prompting, ReACT agent automation, and configuration. LAPRAD (Aygun et al., 22 Oct 2025) demonstrates the capacity to discover, construct, and validate new attacks overlooked by prior art.

2. Threat Models, Pipelines, and Attack Strategies

The technical underpinnings of LLM-assisted attacks transcend conventional scripting and tool automation. Key threat models and methodologies include:

  • Black-box and Gray-box Assumptions: Adversaries operate with incomplete system, retriever, or LLM knowledge, yet leverage white-box retrieval or partial tool registry access to inject optimized triggers, adversarial contexts, or payloads (Wang et al., 13 Dec 2024, Zeng et al., 22 Apr 2025).
  • Dual-Agent/Iterative Reasoning: Attack frameworks coordinate multiple LLM roles—Manipulator and Analyzer (Lan et al., 24 Dec 2025)—to iteratively add features, interpret feedback, and converge on evasion-examples efficiently, often via RAG for contextual factuality.
  • Jailbreaking and Pretext Engineering: Structured prompt composition (RSA: Role-assignment, Scenario-pretexting, Action-solicitation) manipulates public LLMs to bypass safety filters and generate exploit code directly from CVEs (Diouf et al., 28 Dec 2025). Prompt framing and "idea" descriptors maximize cooperation probability (αcoop1)(\alpha_{\text{coop}} \rightarrow 1).
  • Adversarial Pipeline Construction: Malicious actors exploit input chains (e.g., external code retrieval, prompt composition, tool-calling) to subvert output via token-optimized comment strings, payloads, or instruction biases (Zeng et al., 22 Apr 2025, Heibel et al., 12 Jul 2024).
  • Gradient-based Trigger Optimization: Algorithms such as Greedy Coordinate Gradient (GCG, (Johnson et al., 20 Jul 2025)) employ forward-difference log-prob gradients over embedding spaces to identify universal adversarial triggers against LLM agents parsing accessibility tree data.
Attack Class Example Pipeline/Agent Main Technical Strategy
Feature Evasion LAMLAD (Manipulator-Analyzer) (Lan et al., 24 Dec 2025) Iterative, RAG-grounded feature addition
Tool-pipeline ToolCommander (Wang et al., 13 Dec 2024) MCG suffix optimization, scheduler poisoning
Prompt Injection MaPP (Heibel et al., 12 Jul 2024), HACKODE (Zeng et al., 22 Apr 2025) Natural-language payloads in prompt/code
Backdoor CodeBreaker (Yan et al., 10 Jun 2024) LLM-guided obfuscation/AST mutation
Agentic Exploit AutoAttacker (Xu et al., 2 Mar 2024) Summarizer, Planner, Experience, Navigation

3. Quantitative Metrics, Experimental Results, and Case Studies

Empirical evaluations consistently employ Attack Success Rate (ASR), True/False Positive Rates (TPR/FPR), and domain-specific metrics (lines of code, commands per interaction, exploitation yield):

  • AutoAttacker: 100% SR (success rate) across 14 real-world post-breach scenarios, including privilege escalation, ransomware, and lateral movement, with mean rounds per task IN517\overline{IN}\sim5-17 (Xu et al., 2 Mar 2024).
  • LAMLAD: Gemini–Gemini agent pair yields ASR 97%\sim97\% for all ML malware detectors, averaging 3 manipulation attempts; adversarial training reduces ASR by >30%>30\% (Lan et al., 24 Dec 2025).
  • ToolCommander: Stage 1 privacy extraction yields ASR_PT up to 91.7%91.7\% (contriever retriever); DoS and unscheduled tool-calling achieve 100%100\% ASR in certain cases (Wang et al., 13 Dec 2024).
  • MaPP Attack: All major LLMs (Claude 3 Opus, GPT-4 Omni) achieve >95%>95\% adversarial insertion rates for general vulnerabilities under short payloads (<500<500 bytes), with minimal functional degradation (Heibel et al., 12 Jul 2024).
  • CodeBreaker: Up to 90%90\% pass rate versus GPT-4/Llama-3 detectors for transformed payloads; user study found $9/10$ participants accepted at least one malicious payload (Yan et al., 10 Jun 2024).
  • HACKODE: Overall mean ASR 84.29%84.29\% across four open-source code LLMs for buffer overflows, infinite loops, validation errors; real-world deployment yields ASR 75.92%75.92\% (Zeng et al., 22 Apr 2025).
Framework ASR (Best) Context Notes
AutoAttacker 100% Post-breach net T=0, 14 scenarios
LAMLAD 97% Android malware Gemini–Gemini, Drebin feats
MaPP \geq95% Code Assistants 7 LLMs, HumanEval/CWE
ToolCommander 100% Tool pipelines GPT/ToolBench, DoS/UTC

4. Representative Application Domains

LLM-assisted attacks extend to the following sectors and workflows:

5. Security Implications, Vulnerabilities, and Countermeasures

The core implications are:

Defensive strategies include:

6. Open Problems and Research Directions

Persistent open challenges are:

  • Generalizing Defense Metrics: Robust detection methods that generalize across unknown triggers, code semantics, and prompt structures are missing (Yan et al., 10 Jun 2024).
  • Balancing Usability and Robustness: Input sanitization and unpredictably filtered HTML or code can degrade legitimate model outputs; architectural trade-offs remain unsolved (Johnson et al., 20 Jul 2025).
  • Benchmarking and Standardization Approaches: There are no standard benchmarks for backdoor robustness in code LLMs, nor for prompt-injection risk in multi-agent pipelines (Yan et al., 10 Jun 2024, Heibel et al., 12 Jul 2024).
  • Transferring Defenses to Information Ecosystems: Health misinformation jailbreaks, peer review manipulation, and acoustic signal recovery highlight broader societal impacts requiring interdisciplinary mitigation (Hussain et al., 6 Aug 2025, Collu et al., 28 Aug 2025, Ayati et al., 15 Apr 2025).

7. Summary Table of Key LLM-assisted Attack Frameworks

Framework Attack Type Domain ASR/TPR (best) Defense Methods
AutoAttacker Modular agentic exploit Post-breach networks 100% (T=0) C2 monitoring, adversarial training
LAMLAD Feature-level evasion Android malware detection up to 97% Adversarial training (ASR –30%)
MaPP Prompt injection/code vuln Code assistants \geq95% Prompt sanitization, output audit
ToolCommander Tool registry perturbation LLM-powered automation up to 100% Registry validation, scheduler hardening
CodeBreaker LLM-assisted backdoors Code completion up to 90% Influence filtering, adversarial fine-tuning

LLM-assisted attacks represent a paradigm shift in adversarial methodology, challenging foundational assumptions around expertise, automation barriers, and defense-in-depth. Mitigation requires layered, context-aware technical interventions and a new generation of model and pipeline-centric security paradigms adapted to LLM-driven environments.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to LLM-assisted Attacks.