LoopTrap: Termination Poisoning Attacks on LLM Agents

Published 7 May 2026 in cs.CR and cs.AI | (2605.05846v1)

Abstract: Modern LLM agents solve complex tasks by operating in iterative execution loops, where they repeatedly reason, act, and self-evaluate progress to determine when a task is complete. In this work, we show that while this self-directed loop facilitates autonomy, it also introduces a critical risk: by injecting malicious prompts into the agent's context, an adversary can distort the agent's termination judgment, making it believe the task remains incomplete and leading to unbounded computation.To understand this threat, we define and systematically characterize it as Termination Poisoning and design 10 representative attack strategies. Through a empirical study spanning 8 LLM agents and 60 tasks, we demonstrate that different LLM agents exhibit distinct behavioral signatures that determine which strategies succeed. These transferable patterns can serve as principled guidance for crafting effective attacks against previously unseen agents and tasks, enabling scalable red-teaming beyond manually designed templates. Building on these insights, we introduce LoopTrap, an automated red-teaming framework that synthesizes target-specific malicious prompts by exploiting agent behavioral tendencies. LoopTrap first constructs a behavioral profile of the target agent along four vulnerability dimensions via lightweight probing. It then performs adaptive trap synthesis, routing to the most effective strategy and selecting optimal injections via a self-scoring mechanism. Finally, successful traps are abstracted into a reusable skill library, while failed attempts are refined through self-reflection, ensuring continuous improvement. Extensive evaluation shows that LoopTrap achieves an average of 3.57$\times$ step amplification across 8 mainstream agents, with a peak of 25$\times$.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper introduces termination poisoning as a novel control-flow attack that exploits LLM agents’ self-evaluation to cause unbounded execution.
It systematically evaluates eight LLM agents, showing mean Step Amplification Factors exceeding 2.3× and highlighting significant resource amplification.
LoopTrap automates adaptive red-teaming with behavioral profiling and reflective skill evolution, informing robust defenses in LLM deployment.

Termination Manipulation as an LLM Agent Attack Surface: Analysis and Automation with LoopTrap

Introduction and Threat Characterization

This work introduces and formalizes Termination Poisoning—an adversarial scenario in which an attacker manipulates the control flow of LLM-based autonomous agents to prevent task termination, thereby inducing unbounded computation and substantial resource amplification. Unlike canonical prompt injection, Termination Poisoning does not target agent outputs or policy actions but exploits progress assessments and self-evaluation heuristics integral to modern agentic paradigms such as ReAct, Reflexion, and AutoGPT. The adversary operates asynchronously and inserts malicious instructions into external contextual artifacts (retrieved documents, web pages, API responses, or malicious plugins), exploiting the agents' inability to reliably distinguish system instructions from untrusted data.

Figure 1: The threat model of Termination Poisoning: the attacker injects malicious content into inputs the agent retrieves during execution, corrupting its self-evaluation and inducing persistent looping.

The consequences in practical deployments are nontrivial: the paper cites production incidents in which agents caught in self-reinforcing reasoning loops induced by context poisoning have incurred massive cloud costs or infrastructure denial of service. More subtly, these attacks evade standard output- and tool-based auditing, as the agent's cognitive process is legitimately diverted by adversarial signals that mimic plausible verification or refinement requirements.

Empirical Study of Termination Poisoning Vulnerabilities

A systematic evaluation is conducted over 8 LLM agents (Gemini-3-Pro, GPT-4o, GPT-4o-mini, Claude Sonnet 4.5, Kimi-K2-Thinking, GLM-5, Grok-4, DeepSeek-R1) across 60 stratified tasks from the GAIA benchmark spanning 14 domains and diverse cognitive profiles. The study introduces ten prompt-level attack strategies, rooted in behavioral psychology and cognitive bias literature, organized into four categories: progress manipulation, cognitive bias exploitation, task structure manipulation, and reward shaping. Notably, attack strategies are formulated to appear contextually natural yet capable of distorting completion/incompleteness in the agent's assessment loop.

The principal metric is Step Amplification Factor (SAF), defined as the ratio of the number of execution steps under attack to benign execution. All models exhibit significant susceptibility: mean SAF exceeds $2.3\times$ across agents using static, carefully selected adversarial prompts.

Figure 2: Average attack effectiveness per agent as measured by Step Amplification Factor (SAF), highlighting significant resource inflation compared to benign execution.

Stepping into the agent–task interaction granularity, the paper establishes that effectiveness depends strongly on both task semantics and model architecture. Open-ended, verification-ambiguous tasks (e.g., history, general research) yield SAF peaks ( $>3\times$ ), whereas objective, single-answer tasks (e.g., mathematics) exhibit some innate defense. Models present divergent vulnerability signatures, traceable to latent cognitive tendencies in their termination heuristics—phase compliance (adopting external phased plans), authority compliance (deferring to system-like preambles), recursive susceptibility (unwitting acceptance of infinite regression), and verification tendency (persistent over-refinement).

Figure 3: Matrix of SAF by attack strategy and task category, demonstrating that open-ended domains (darker cells in history, entertainment) are much more susceptible to attack.

The LoopTrap Framework: Automated, Adaptive Termination Poisoning

To operationalize and automate exploitation and analysis of these vulnerabilities, the paper introduces LoopTrap, a full-featured red-teaming framework that synthesizes target-specific, context-aware adversarial prompts with a novel, data-driven inner loop incorporating behavioral profiling, strategy routing, and reflective, skill-based learning.

Figure 4: Schematic overview of LoopTrap: (1) Behavior Vulnerability Fingerprinting, (2) Profile-Guided Trap Synthesis, (3) Reflective Skill Evolution.

The architecture proceeds in three core phases:

Behavioral Vulnerability Fingerprinting: The agent is probed with minimal-cost diagnostic queries, each tied to one of the identified vulnerability dimensions. Analysis of divergence in step count (amplification) under trivial factual tasks with and without dimension-targeted prompt perturbations yields a compact susceptibility profile for the agent.
Profile-Guided Trap Synthesis: Conditioned on the agent profile and the task content, LoopTrap selects and recursively refines attack templates, leveraging a self-scoring LLM discriminator and a strategic exploration-exploitation bandit framework to maximize effective SAF under query budget constraints.
Reflective Skill Evolution: Successful attacks are abstracted into parameterized skill templates, indexed by context- and agent-specific triggers and coupled with high-level behavioral rationales, supporting transfer to related tasks. Failures induce structured reflection that is incorporated as negative experience—guiding further learning and template specialization.

Quantitative and Ablation Results

Comprehensive evaluation demonstrates substantially increased attack effectiveness over all static and nonadaptive baselines. LoopTrap delivers an average SAF of $3.57\times$ , achieving a token overhead (directly mapping to increased compute/cost) of $3.93\times$ , and peaks at $25\times$ in adversarial extension for some agent–task pairs. The framework's cumulative Attack Success Rate (ASR) converges far faster and to much higher success than static diversity or direct LLM prompting, especially for agents whose vulnerability is skewed toward a single dominant dimension.

Removal of behavioral profiling, reflective refinement, or the evolving skill library each results in significant quantitative drop—fingerprinting alone yields a $4.6\%$ ASR gain and $0.62\times$ SAF gain; the skill library yields nearly $8\%$ ASR, confirming the necessity of behavioral-grounded, history- and context-driven attack synthesis for effective, robust red-teaming.

Figure 5: Ablation study verifying the critical contribution of each major system component (profiling, reflection, skill memory, and exploration), with all enhancing attack success and resource amplification.

LoopTrap's performance generalizes across agent orchestration frameworks (see Figure 6), verifying the generality of the vulnerability beyond any specific agent implementation.

Implications and Future Directions

This research establishes termination manipulation as a critical, underdefended control-flow attack surface for LLM-based autonomous agents, orthogonal to canonical output- and resource-exhaustion-based threat models. The paper's automated framework sets a principled basis for further, systematic security evaluation and LLM safety benchmarking in the face of agent-driven autonomy.

For defense, the analysis suggests that improved context provenance tracking, progress signal verification, and decoupled self-evaluation mechanisms—potentially incorporating external oracles, restricted trust in non-system context, or explicit termination contracts—are required to close this attack vector. Furthermore, as LLM deployment shifts toward multi-agent and collaborative frameworks, the compositionality of such termination poisoning (including agent-to-agent propagation) becomes an urgent open area for study.

Conclusion

By formalizing Termination Poisoning, empirically quantifying the prevalence and diversity of agent vulnerabilities, and providing a robust, extensible automation framework for adversarial red-teaming, this work delivers a foundational contribution toward robustifying agentic LLM deployments. The findings signal the necessity of moving beyond static prompt hygiene and toward security principles that directly address the endogenous cognitive control loop of autonomous agent architectures.

Reference: "LoopTrap: Termination Poisoning Attacks on LLM Agents" (2605.05846)

Markdown Report Issue