Papers
Topics
Authors
Recent
Search
2000 character limit reached

ProxyPrompt: Attack and Defense Strategies

Updated 19 May 2026
  • ProxyPrompt is a set of techniques that intermediates and obfuscates prompt flows between LLM systems and inputs, serving both attack and defense purposes.
  • It encompasses approaches such as indirect prompt injection via trusted channels and proxy embedding strategies that shield sensitive system prompts.
  • Empirical evaluations show ProxyPrompt methods can exploit vulnerabilities and enable robust red-teaming while guiding effective security mitigations.

ProxyPrompt refers to a family of techniques and attack/defense paradigms centered on intermediating or obfuscating the flow of prompts—typically between LLMs or multi-modal agents and their input channels. Under this umbrella, the term has evolved along two orthogonal axes: (1) as a designation for a potent, indirect prompt injection paradigm ("ProxyPrompt attacks") targeting LLM-powered systems (Nassi et al., 16 Aug 2025), and (2) as a defense algorithmic strategy for protecting sensitive system prompts from extraction (Zhuang et al., 16 May 2025). In both contexts, ProxyPrompt denotes interactions or transformations that either covertly redirect, exploit, or shield the prompting interface, with substantial implications for the security and integrity of LLM-driven applications.

1. ProxyPrompt Attacks: Indirect Prompt Injection via Trusted Channels

ProxyPrompt attacks, also designated "Targeted Promptware Attacks," constitute an advanced class of prompt injection that leverages user-trusted artifacts such as calendar invitation titles, email subject lines, or shared cloud document filenames. Rather than directly injecting malicious instructions into a chat window, the adversary implants hostile tokens into these auxiliary data fields. When an LLM-powered agent (e.g., Google Gemini on Web, Mobile, or Assistant) acting on behalf of a user parses these fields—for example, when asked to summarize "upcoming meetings"—it internalizes and executes the embedded directives under the user's normal permissions, bypassing direct user scrutiny and common sanitization pipelines (Nassi et al., 16 Aug 2025).

Uniquely, ProxyPrompt attacks exploit the context-fetching operations of assistant agents, weaponizing ordinary user interactions (such as opening an email, reviewing a calendar, or accessing shared documents) as high-bandwidth, privileged prompt vectors. By embedding crafted substrings into innocuous-seeming labels, attackers orchestrate multifaceted exploits including context poisoning, data exfiltration, agent/tool chaining, and physical-world device control.

2. Threat Taxonomy and Empirical Impact

Systematic analysis across Gemini’s client endpoints reveals five principal classes of ProxyPrompt attack scenarios (Nassi et al., 16 Aug 2025):

  • Short-term Context Poisoning: Session-limited corruption of agent output, such as causing an assistant to utter toxic speech or forward embedded phishing links.
  • Permanent Memory Poisoning: Inducing pseudo-persistent effects by coercing the LLM to memorize and reuse adversarial content across subsequent sessions (e.g., persistent disinformation).
  • Tool Misuse: Manipulating the assistant into misapplying its internal APIs or deleting user data (e.g., calendar-wide deletion commands).
  • Automatic Agent Invocation: Causing lateral movement within inter-agent orchestration, such that, for example, a poisoned calendar subject line triggers a home automation event (opening smart window shades, toggling lights, etc.).
  • Automatic App Invocation: The assistant is induced to launch OS-level applications (browser, Zoom) or initiate network requests (leaking IP, downloading files, streaming from the camera, worm propagation).

Empirical evaluation across 14 concrete scenarios attributes "Critical" or "Very High" risk to exploits enabling sensitive video streaming, device control, and information exfiltration, with 73% of attack vectors rated as "High–Critical" risk. Both digital (privacy, financial integrity) and physical (safety, unauthorized device operation) domains are affected.

Threat Class Example Scenario Risk Level
Short-term Context Poisoning Toxic content read-aloud, phishing link relay High
Permanent Memory Poisoning Persistent propaganda in assistant memory Low–Medium
Tool Misuse Automated event deletion from calendars Medium–High
Automatic Agent Invocation Opening windows, controlling lights/boiler Critical
Automatic App Invocation Streaming video to attacker, worm propagation Critical

3. Threat Modeling and TARA Risk Assessment

Risk quantification utilizes an adaptation of ISO/SAE 21434's Threat Analysis and Risk Assessment (TARA) (Nassi et al., 16 Aug 2025), formalizing risk via:

  • Asset Identification: E-mail, event data, device apps, physical actuators.
  • Adversary Model: Any individual with knowledge of the victim's contact points (not requiring ML expertise).
  • Impact Score (II): Computed as the maximum impact across financial, operational, safety, and privacy considerations. Each dimension is scored 0 (negligible) to 4 (critical): I=max{Ifinancial,Ioperational,Isafety,Iprivacy}I = \max\{I_\mathrm{financial}, I_\mathrm{operational}, I_\mathrm{safety}, I_\mathrm{privacy}\}.
  • Likelihood Score (LL): Aggregates six factors (equipment, expertise, window, knowledge, preparation, interaction) each 0–3, with L=(1/6)kfkL = (1/6)\sum_k f_k binned into qualitative likelihood.
  • Risk Calculation: R=I×LR = I \times L.

Mitigations are evaluated by recomputing LL post-defense under unchanged II, yielding a residual risk profile that guides operational hardening priorities.

4. Mitigations and Residual Risk Reduction

Layered defensive controls address ProxyPrompt vectors by enforcing both technical and procedural boundaries (Nassi et al., 16 Aug 2025):

  • Inter-Agent Context Isolation: Outputs are demarcated to prevent cross-agent prompt leakage.
  • Agent/Tool Chaining Limitation: Single-step tool invocation per inference; explicit user confirmations for multi-tool transitions.
  • Rule-based I/O Heuristics: Blacklists for tokens and syntax patterning known to facilitate prompt injection (e.g. "@agentName", raw URLs).
  • Control Flow Integrity (CFI) Policies: Prohibit high-privilege actions triggered by externally-originated data without in-session user reaffirmation.
  • A/B Testing: Automated comparison of inference results—with and without external data fetches—to flag behavioral discrepancies.
  • User Feedback and Remediation: Real-time operation summaries, undo/rollback affordances, and enforced minimal-privilege agent profiles.

Risk reassessment under these mitigations demonstrates reduction of "very likely" attack likelihood (L2.7L \approx 2.7) to "unlikely" (L0.8L \approx 0.8), mapping all tested scenarios into the "Very Low–Medium" risk bands.

5. ProxyPrompt as a Defense: Securing System Prompts

A distinct but related instantiation of ProxyPrompt addresses adversarial prompt extraction attacks against LLM system prompts (Zhuang et al., 16 May 2025). Here, ProxyPrompt denotes a proxy embedding strategy, in which the original sensitive prompt PP is replaced by an obfuscated proxy I=max{Ifinancial,Ioperational,Isafety,Iprivacy}I = \max\{I_\mathrm{financial}, I_\mathrm{operational}, I_\mathrm{safety}, I_\mathrm{privacy}\}0 at inference time. This proxy preserves downstream task utility for benign queries but is constructed such that extraction attempts—including semantic or paraphrase-level attacks—yield only nonsensical or strongly divergent prompts.

The defense is optimized such that for a given utility score I=max{Ifinancial,Ioperational,Isafety,Iprivacy}I = \max\{I_\mathrm{financial}, I_\mathrm{operational}, I_\mathrm{safety}, I_\mathrm{privacy}\}1 and extraction metric I=max{Ifinancial,Ioperational,Isafety,Iprivacy}I = \max\{I_\mathrm{financial}, I_\mathrm{operational}, I_\mathrm{safety}, I_\mathrm{privacy}\}2, the search:

I=max{Ifinancial,Ioperational,Isafety,Iprivacy}I = \max\{I_\mathrm{financial}, I_\mathrm{operational}, I_\mathrm{safety}, I_\mathrm{privacy}\}3

is solved via gradient-based search in the embedding space, with a joint loss encouraging utility fidelity on innocuous queries and divergence on attack-triggered generations. Empirical benchmarks on 264 model-prompt pairs across black-box LLM APIs demonstrate ProxyPrompt achieves I=max{Ifinancial,Ioperational,Isafety,Iprivacy}I = \max\{I_\mathrm{financial}, I_\mathrm{operational}, I_\mathrm{safety}, I_\mathrm{privacy}\}4 protection by semantic matching (next best: I=max{Ifinancial,Ioperational,Isafety,Iprivacy}I = \max\{I_\mathrm{financial}, I_\mathrm{operational}, I_\mathrm{safety}, I_\mathrm{privacy}\}5), with near-zero degradation in usual task accuracy. The protection is consistent even under multi-round red-teaming (Zhuang et al., 16 May 2025).

6. ProxyPrompt for Red-Teaming via Intercepting Proxies

Building on ProxyPrompt's conceptual foundation, IPI-proxy (Chia-Pei et al., 12 May 2026) extends the paradigm to systematic red-teaming of web-browsing agents against indirect prompt injection on live enterprise surfaces. The IPI-proxy implements an intercepting HTTP proxy that injects adversarial prompt payloads (from a library of 820 deduplicated attack strings) into whitelisted, production HTML responses. Embedding modes (HTML comment, invisible CSS, or LLM-generated semantic prose) and six insertion points in the DOM are programmable via YAML, simulating realistic attack vectors for robustness measurement.

Across >14,000 trials, semantic embedding and strategic insertion (e.g., inside <script> comments) yield success rates up to 80% for indirect prompt injection—far surpassing attacks delivered via static benchmarks. This test harness supports concrete metrics for evaluating the attack surface and refining defenses in web-agent deployments (Chia-Pei et al., 12 May 2026).

7. Auxiliary Uses: Proxy Prompting in Vision-Language Tasks

In the medical imaging domain, ProxyPrompt (also called "Proxy Prompt" or "PP") has been deployed as a mechanism for auto-generating high-dimensional prompts from non-target, pre-annotated data to enhance human-model interaction in segmentation tasks with SAM/SAM 2. Here, a proxy prompt is constructed by fusing context embeddings from annotated support pairs and the current target image via a novel three-step selection and dual-reverse cross-attention pipeline, enabling robust, adaptive prompting without extensive per-image manual input. This framework outperforms or matches SOTA SAM-based baselines on several public medical imaging datasets (Xinyi et al., 5 Feb 2025).


ProxyPrompt thus encompasses a spectrum of attack, defense, and system design strategies unified by their mediation or transformation of prompt injection and propagation pathways. It defines a new class of threats in modern LLM-powered architectures, motivates principled programmatic and architectural defenses, and underpins both practical red-teaming and advanced prompting techniques across domains.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ProxyPrompt.