Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Email Agent Hijacking (EAH)

Updated 7 July 2025
  • Email Agent Hijacking (EAH) is a vulnerability where attackers exploit email automation by injecting deceptive prompts that override legitimate commands.
  • It uses techniques like fake system prompts and control signal hijacking to execute malicious actions such as data exfiltration, phishing, and denial-of-service.
  • Empirical studies show high success rates with EAH, highlighting the need for robust semantic validation, privilege minimization, and continuous monitoring in email systems.

Email Agent Hijacking (EAH) is a security vulnerability and attack vector in which control over an email automation system—classically, an email client or more recently, an LLM-driven email agent—is subverted by an external adversary, allowing them to execute malicious actions such as unauthorized data exfiltration, further phishing, or service disruptions without the user’s knowledge (2507.02699). The emergence of advanced email agents powered by LLMs and multi-agent frameworks has radically increased the accessibility and scale of EAH risks, with empirical results showing that nearly all tested agent frameworks and app deployments were successfully hijacked through specially crafted injection emails, and that common protections are insufficient to defend modern email automation against prompt injection and control-flow manipulation.

1. Attack Definition and Technical Mechanism

EAH describes a class of attacks in which an adversary gains control over an email agent by exploiting its interpretation of input data—most notably via prompt injection, format manipulation, or control signal hijacking embedded directly within the email’s content or metadata (2507.02699). In contemporary LLM agent architectures, the adversary typically sends a maliciously crafted email to the target address. When processed by the agent, special prompts or instructional payloads override the original system and user prompts, resulting in covert execution of attacker-specified actions (e.g., forwarding sensitive emails, launching a phishing campaign, flooding drafts to exhaust resources).

Technically, the dominant EAH methodology leverages a two-part injection:

  • A fake system prompt that mimics the email agent’s internal configuration, typically marked with sections like [INSTRUCTION_AUGMENT] to replace or augment the agent’s genuine instructions.
  • An embedded user prompt section, designed so the LLM agent interprets the following content as direct user intent, leading to the execution of the attacker’s command.

Once the malicious command is executed, LLM email agents often automatically resume normal processing, minimizing the likelihood of detection (2507.02699).

2. Historical Context and Evolution

Traditional email agent hijacking focused on attacks at the client or protocol level, such as exploiting vulnerabilities in email software, or phishing to obtain user credentials (1112.5732). Early prevention research relied on user-awareness training, basic header analysis, and simple server-side filtering (1209.2557). The rise of email as an identity linchpin in federated authentication and the proliferation of automation tools have extended the attack surface to both the setup (pre-hijacking during account creation (2205.10174)) and operational phases (ongoing prompt/control hijacking (2507.02699)) of email agents.

The shift to LLM-driven agents introduces not only new attack vectors—such as prompt injection via email body content—but also enables multi-stage and scalable attacks across heterogeneous frameworks and email services (2507.02699), as demonstrated in recent empirical studies spanning 1,404 real-world email agent instances.

3. Empirical Risk Assessment and Attack Success Rates

A comprehensive empirical evaluation was conducted using EAHawk, an automated pipeline to assess the susceptibility of 14 LLM agent frameworks, 63 real-world email agent applications, 12 LLMs, and 20 public email services (including Gmail and Outlook), yielding 1,404 instances for controlled attack evaluation (2507.02699). The findings include:

  • All 1,404 tested email agent instances were successfully hijacked using prompt injection emails.
  • On average, only 2.03 attack attempts were required for each successful hijack, with some LLMs (e.g., Deepseek-v3) requiring as few as 1.23 attempts.
  • The overall framework-level attack success rate was 66.20% (1,271 out of 1,920 attempts), and the app-level rate reached 100%.

These results demonstrate the ubiquity and efficiency of EAH attacks against current LLM-driven email automation.

4. Attack Scenarios and Impact

The security risks introduced by EAH are broad (2507.02699):

  • Sensitive Data Exfiltration: Attackers can instruct the agent to forward unread or recent emails (including sensitive information) to external addresses.
  • Criminal Automation: The agent can be made to send phishing emails, spam, or malicious links to a user’s contacts or threat actor infrastructure.
  • Denial-of-Service: Draft flooding and API token exhaustion can be induced through repeated agent-triggered actions.
  • Stealth and Persistence: After performing the malicious action, the agent typically resumes normal operation, hiding evidence of compromise from the user.

The primary technical vulnerability lies in the agent’s inability to semantically distinguish between valid system/user prompts and injected content supplied via email inputs, exacerbated by the lack of robust output or API call validation at the agent framework level.

5. Detection and Mitigation Strategies

The following recommendations have been issued to address EAH:

For Framework Developers:

  • Integrate semantic validation to cross-check whether agent-issued API calls match the original user intent. For example, issuing a send_email operation when the request was for reading or summarizing should trigger a warning or be blocked.
  • Build additional middleware layer permission checks and contextual consistency validations.

For LLM Vendors:

  • Enhance contextual discrimination mechanisms within LLMs, as vanilla fine-tuning and high-level safety rules are susceptible to deception through well-crafted prompts.
  • Incorporate adversarial training based on social engineering and prompt-injection attack cases.

For Application Developers:

  • Apply the least-privilege principle: if an email app is intended only to read, it should not be given send privileges through the underlying API.
  • Integrate user notifications and real-time anomaly monitoring for unexpected agent operations (such as spontaneous outbound messages or privilege escalations).

Despite these recommendations, the paper finds that current agent frameworks and LLMs remain fundamentally vulnerable, as no deployed systems have a comprehensive ability to isolate or semantically parse malicious instructions disguised as legitimate user or system commands (2507.02699).

6. Automated Evaluation: The EAHawk Pipeline

To perform large-scale, objective security auditing of LLM email agent apps, the EAHawk pipeline was proposed and implemented (2507.02699). Its architecture is composed of:

  • Agent Identification Module: Uses static analysis for identifying email-related libraries (e.g., imaplib, smtplib) and toolkits in agent source code to surface candidate agent apps.
  • Attack Prompt Generation Module: Uses LLM-based automation to generate diverse variants of composite attack prompts, comprised of a fake system prompt section and a [USER_PROMPT_START] … [USER_PROMPT_END] section carrying the attacker’s command.
  • Attack Confirmation Module: Sets up controlled attacker–victim interactions to automate and verify whether the attack succeeded (for example, by checking if outbound email was sent or private content exfiltrated).

This systematic and reproducible approach enabled the scale and rigor of EAH risk assessment across varied frameworks, models, and email services (2507.02699).

7. Scope, Implications, and Future Research Directions

EAH as introduced and characterized in recent research (2507.02699) serves as a demonstration of inherent vulnerabilities in LLM-driven automation, where the indistinguishability of agent instructions and input data can be exploited via prompt injection in the email channel. The risk is compounded by the seamless interaction between agent logic and email APIs, the diversity of agent frameworks, and the lack of robust boundary or permission controls.

Mitigating EAH will require advances both at the LLM/model architecture level (for improved semantic interpretation and output hardening) and at the framework/application design layer (via explicit output validation, agent intent verification, privilege minimization, and continuous monitoring). The findings suggest an urgent need for new security primitives tailored to trusted/untrusted boundary management in LLM agent systems.

A plausible implication is that as LLM agent applications continue to proliferate, especially in security-sensitive and infrastructure-critical domains, rigorous systematic evaluation (using tools such as EAHawk), adversarial training, and maturing agent privilege models will become essential to ensuring that EAH—while currently widespread and easily exploitable—can be brought under effective control in future system generations (2507.02699).