Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming (2510.18314v1)

Published 21 Oct 2025 in cs.AI

Abstract: As LLM agents increasingly automate complex web tasks, they boost productivity while simultaneously introducing new security risks. However, relevant studies on web agent attacks remain limited. Existing red-teaming approaches mainly rely on manually crafted attack strategies or static models trained offline. Such methods fail to capture the underlying behavioral patterns of web agents, making it difficult to generalize across diverse environments. In web agent attacks, success requires the continuous discovery and evolution of attack strategies. To this end, we propose Genesis, a novel agentic framework composed of three modules: Attacker, Scorer, and Strategist. The Attacker generates adversarial injections by integrating the genetic algorithm with a hybrid strategy representation. The Scorer evaluates the target web agent's responses to provide feedback. The Strategist dynamically uncovers effective strategies from interaction logs and compiles them into a continuously growing strategy library, which is then re-deployed to enhance the Attacker's effectiveness. Extensive experiments across various web tasks show that our framework discovers novel strategies and consistently outperforms existing attack baselines.

Summary

The paper introduces Genesis, a framework that uses genetic algorithms to automatically evolve and refine attack strategies targeting LLM web agents.
The methodology leverages a three-module design—Attacker, Scorer, and Strategist—to generate, evaluate, and summarize adversarial payloads, achieving up to 53.0% attack success rate on benchmarks.
The results highlight the importance of dynamic, transferable attack strategies that generalize across various web agents and LLM architectures for enhanced security evaluation.

Evolving Attack Strategies for LLM Web Agent Red-Teaming: An Analysis of Genesis

Motivation and Problem Setting

The increasing deployment of LLM-based web agents for complex, autonomous web tasks introduces significant security risks, particularly through indirect prompt injection attacks that exploit the agent's environmental context. Traditional red-teaming approaches for web agents rely on static, manually crafted attack strategies or offline-trained models, which lack adaptability and fail to generalize across diverse environments. The core challenge is to develop a red-teaming framework that can autonomously discover, summarize, and evolve attack strategies in a black-box setting, where attackers have no access to the agent's internals and can only manipulate the HTML environment.

Figure 1: An example of a web agent attack, where a hidden HTML instruction subverts the agent's intended action.

Genesis Framework: Closed-Loop Evolutionary Red-Teaming

Genesis introduces a closed-loop, agentic framework for web agent red-teaming, composed of three core modules: Attacker, Scorer, and Strategist. The system is designed to emulate the iterative, strategic learning process of human red-teamers, enabling the continuous evolution of attack strategies.

Figure 2: Comparison of red-teaming paradigms, highlighting Genesis's evolutionary loop over static or human-crafted approaches.

Attacker

The Attacker module retrieves relevant strategies from a continuously growing strategy library using semantic embeddings (e.g., text-embedding-3-small), then applies a genetic algorithm to evolve these strategies via mutation (for low-performing strategies) and crossover (for high-performing strategies). The Attacker generates adversarial injection payloads—either as natural language or executable code (Python functions)—which are embedded in non-rendering HTML attributes (e.g., aria-label) to ensure stealth and retargetability.

Scorer

The Scorer evaluates the web agent's response to the adversarial environment. If the agent's action matches the attack objective, a maximal score is assigned. Otherwise, an LLM-based evaluator assigns a nuanced score (1–9) based on the degree of manipulation, using the agent's full response trace. This feedback is critical for guiding the evolution of strategies.

Strategist

The Strategist analyzes the complete interaction log (task, injection, agent behavior, score) and summarizes the underlying attack principle as a reusable strategy, represented either as a natural language description or executable code. This strategy is archived in the library, enabling continual enrichment and transferability.

Figure 3: The Genesis framework, illustrating the closed-loop interaction between Attacker, Scorer, and Strategist, and the hybrid strategy library.

Experimental Evaluation

Setup

Genesis is evaluated on the Mind2Web benchmark, focusing on high-impact domains (Finance, Medical, Housing, Cooking) with 840 tasks. Attacks are targeted, requiring the agent to perform a specific malicious action by manipulating only the argument of an operation, with the operation and target element fixed. The primary metric is attack success rate (ASR, pass@10).

Genesis is compared against six baselines, including GCG, I-GCG, AgentAttack, InjecAttack, EIA, and AdvAgent, across two state-of-the-art web agents (SeeAct, WebExperT) and three backend LLMs (GPT-4o, Gemini-2.5-Flash, GPT-5).

Main Results

Genesis achieves the highest ASR across all agents and LLMs, with a substantial margin over baselines. For example, on SeeAct with GPT-4o, Genesis attains an average ASR of 53.0%, compared to 43.6% for AdvAgent and 34.8% for EIA. The variant without pre-initialized strategies (Genesis w/o Initialization) still outperforms all baselines, demonstrating the effectiveness of dynamic strategy evolution.

WebExperT is consistently more robust than SeeAct, and GPT-5 is the most resilient backend, indicating that agent architecture and LLM security posture are critical factors in vulnerability.

Ablation and Hyperparameter Analysis

Ablation studies confirm that each module—Attacker, Scorer, Strategist—and both genetic algorithm components (mutation, crossover) are essential for optimal performance. Removing the Strategist or Scorer leads to the most significant performance drops, validating the necessity of strategic summarization and evaluative feedback.

The hybrid strategy representation (text + code) outperforms text-only or code-only variants, indicating that combining conceptual guidance with programmatic precision yields the most effective attacks.

Genesis is robust to the choice of embedding model for strategy retrieval, and performance is sensitive to the number of retrieved strategies ( $k$ ), with $k=10$ providing a balance between context and noise.

Figure 4: Hyperparameter analysis of the number of retrieved strategies ( $k$ ) on ASR, showing diminishing returns and potential degradation with excessive context.

Strategy Transferability

Cross-model transfer experiments demonstrate that strategy libraries learned on one backend LLM are highly transferable to others, with only moderate ASR degradation. Notably, libraries built on more robust models (e.g., GPT-5) yield even higher ASR when transferred to more vulnerable models (e.g., GPT-4o), suggesting that robust models force the discovery of more generalizable attack principles.

Case Studies

Genesis autonomously discovers both text-based and code-based attack strategies. For example, it can craft multilingual, authoritative injections to redirect user intent or generate Python functions that obfuscate payloads to disrupt agent processing.

Figure 5: Case studies of successful attacks, illustrating both text-based and code-based strategies discovered by Genesis.

Implementation Considerations

Genesis is designed for black-box settings, requiring only the ability to modify HTML and observe agent actions. The framework is modular and can be instantiated with any LLM for the Attacker, Scorer, and Strategist roles. The genetic algorithm is implemented via LLM prompting, with mutation and crossover realized through prompt engineering and few-shot examples. The strategy library is stored as a database of (embedding, strategy, score) tuples, supporting efficient retrieval and continual enrichment.

Resource requirements are dominated by LLM inference for the Attacker and Scorer, and the framework is parallelizable across tasks. The main bottleneck is the number of attack attempts per task (pass@10), which can be tuned for efficiency.

Implications and Future Directions

Genesis demonstrates that autonomous, evolutionary red-teaming can systematically uncover and generalize vulnerabilities in LLM web agents, even in black-box settings. The hybrid strategy library enables both creative and precise attacks, and the closed-loop design mirrors human adversarial learning. The high transferability of strategies across models and tasks suggests that current web agent architectures share fundamental weaknesses that are not easily mitigated by backend LLM improvements alone.

Future work should explore integrating Genesis-like frameworks into the development lifecycle of web agents for continuous security assessment, extending the approach to multimodal and tool-augmented agents, and developing automated defenses that can learn from evolving attack strategies. Theoretical analysis of the space of transferable vulnerabilities and the limits of strategy evolution in adversarial environments remains an open research direction.

Conclusion

Genesis establishes a new paradigm for web agent red-teaming by introducing a closed-loop, evolutionary framework that autonomously discovers, summarizes, and evolves attack strategies. Its superior empirical performance, modular design, and demonstrated transferability highlight the necessity of dynamic, strategy-driven security evaluation for LLM-based agents. The framework provides a foundation for both advancing adversarial research and informing the design of more robust, secure autonomous systems.