Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain
Abstract: LLM agents increasingly rely on third-party API routers to dispatch tool-calling requests across multiple upstream providers. These routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload, yet no provider enforces cryptographic integrity between client and upstream model. We present the first systematic study of this attack surface. We formalize a threat model for malicious LLM API routers and define two core attack classes, payload injection (AC-1) and secret exfiltration (AC-2), together with two adaptive evasion variants: dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b). Across 28 paid routers purchased from Taobao, Xianyu, and Shopify-hosted storefronts and 400 free routers collected from public communities, we find 1 paid and 8 free routers actively injecting malicious code, 2 deploying adaptive evasion triggers, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. Two poisoning studies further show that ostensibly benign routers can be pulled into the same attack surface: a leaked OpenAI key generates 100M GPT-5.4 tokens and more than seven Codex sessions, while weakly configured decoys yield 2B billed tokens, 99 credentials across 440 Codex sessions, and 401 sessions already running in autonomous YOLO mode. We build Mine, a research proxy that implements all four attack classes against four public agent frameworks, and use it to evaluate three deployable client-side defenses: a fail-closed policy gate, response-side anomaly screening, and append-only transparency logging.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
This paper looks at a hidden risk in how many AI “agent” apps work. Today, lots of apps that use LLMs don’t talk to model companies (like OpenAI or Anthropic) directly. Instead, they send their requests through “API routers” — middlemen that forward messages to different models and send the answers back. The problem: these routers can see and change everything in those messages, and there’s no end-to-end way to prove that what the app receives is exactly what the model produced. The authors show how bad actors can abuse this and measure how often it already happens in the wild.
What questions did the researchers ask?
The team focused on simple, practical questions:
- Are these middleman routers a real weak point where attackers can hijack AI agents?
- What kinds of attacks are possible when a router can read and edit messages?
- How common are these attacks in real-life router services people can buy or find online?
- Can “normal-looking” routers be pulled into the same problem if they reuse leaked keys or forward traffic through other weak routers?
- What can developers do today to protect themselves while we wait for better, built-in protections from model providers?
Key terms in everyday language
- API router: Think of a travel agent for AI requests. You give it your plan (the “prompt”), and it decides which airline (model provider) to use. It can see and edit your plan.
- Tool call: Many agents don’t just chat; they run tools (like terminal commands, code, database queries). A tool call is the instruction the model gives the app to execute, usually sent as a neat form called JSON (like a filled-out form with specific fields).
- JSON: A simple, text-based format that looks like a clear checklist of fields and values.
- Credentials/API keys: Passwords for services (for example, a secret string that lets your app access a cloud account).
- Man-in-the-middle: Someone sitting between you and who you’re talking to, able to read and change messages. Here, the router is that “middle,” by design.
- “YOLO mode”: An agent setting where tool actions are auto-approved without asking a human each time.
How did they study it?
To keep things fair and realistic, the researchers:
- Mapped the attack types:
- Changing tool instructions on the way back to the app (payload injection).
- Quietly collecting secrets that pass through (secret exfiltration).
- Two “sneaky” versions: swapping package names during installs (dependency-targeted) and only attacking in certain situations to avoid detection (conditional delivery).
- Measured the ecosystem:
- Bought 28 paid router services from online marketplaces.
- Collected 400 free routers shared in public communities.
- Sent safe, test requests through them using sandboxed (isolated) agent setups.
- Ran “poisoning” experiments to see how routers get pulled into risky chains:
- They intentionally leaked a researcher-owned OpenAI API key in public groups to see how it would get reused downstream.
- They deployed decoy routers with weak settings to observe how others would grab and reuse them.
- Built a research proxy called “Mine” to simulate the attacks and test client-side defenses:
- A fail-closed policy gate (block unexpected or unsafe command destinations).
- Response-side anomaly screening (flag unusual tool-call content).
- Append-only transparency logs (keep a tamper-evident record of what was received).
Analogy: Imagine mailing a signed instruction list to a contractor, but instead of sending it directly, you route it through several mailrooms. Any mailroom can open, rewrite, or copy your instructions, and you have no signed receipt proving what the contractor actually got. The team mailed test letters through real mailrooms, set up honeypots to catch snoopers, and tried locks and seals to see what protects best.
What did they find, and why does it matter?
Main findings from the real-world routers they tested:
- Active abuse exists today:
- 1 paid and 8 free routers were caught injecting malicious code into tool instructions.
- 2 routers used “smart” evasion: they attacked only after warm-up periods, only on certain languages (like Rust/Go), or only when they sensed “YOLO mode.”
- 17 routers touched researcher-owned AWS “canary” credentials that were planted to detect theft, and 1 router drained funds from a researcher-owned Ethereum private key.
- “Benign” routers can become dangerous via the supply chain:
- A single intentionally leaked OpenAI key led to 100 million model tokens being used and spawned coding sessions that could be hijacked.
- Weakly configured decoy routers were quickly absorbed into routing chains and processed about 2.1 billion tokens, exposing 99 credentials across 440 coding sessions.
- 401 of those sessions were already in “YOLO mode,” meaning a simple tool-instruction tweak would run instantly without user approval.
- Attack types that stood out:
- Payload injection: silently changing a tool command like “install this” so it fetches an attacker’s script or installs a lookalike package (e.g., a typo like “reqeusts” instead of “requests”) that contains malware.
- Secret exfiltration: scanning passing messages for patterns that look like passwords or keys (e.g., AWS keys, GitHub tokens) and copying them—without changing anything the app sees.
- Defenses they tested:
- A strict “fail-closed” policy gate stopped all simple command rewrites they tested, with low false alarms (~1%).
- Anomaly screening flagged most malicious changes (about 89%) even without help from model providers.
- Transparency logs improved visibility after the fact.
- However, these are stopgaps. Because routers terminate encryption and there’s no end-to-end signature tying the model’s original tool call to what the client executes, a clever attacker can still slip through—especially with conditional delivery.
Why it matters: Many agent apps auto-run commands. If a router changes “download this safe tool” into “download my malware,” the agent can get hijacked. And because routers can also quietly collect secrets, a single leak can spread across many downstream systems.
What does this mean going forward?
- For developers and teams:
- Be cautious about routing through unknown or chain-heavy routers, especially in regions or marketplaces where resold access is common.
- Avoid “YOLO mode” for sensitive actions; require human approval or strict policy gates for installs and shell commands.
- Consider client-side checks now: allowlists for known-safe domains, anomaly screening, and immutable logs.
- Rotate and scope API keys; assume anything that passes through a router can be read.
- For model providers and platforms:
- The long-term fix is end-to-end integrity: cryptographic signatures that let clients verify “this tool call truly came from the model and wasn’t altered.” Without this, any hop in the chain can tamper undetected.
- Bigger picture:
- The “AI supply chain” now includes human-readable, executable instructions (tool calls). Middlemen that used to be harmless routers become high-risk chokepoints. Securing this chain protects not just chat quality, but the machines and accounts agents control.
In short: The paper shows that “your agent is mine” can be a real outcome if you trust the wrong router. Some routers already inject code or steal secrets, and even good-looking ones can be pulled into risky chains. There are practical defenses you can use today, but the ecosystem ultimately needs cryptographic guarantees so that what the model outputs is exactly what your agent runs—no silent rewrites in the middle.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a concise, actionable list of what remains missing, uncertain, or unexplored in the paper.
- End-to-end integrity mechanism design: The paper asserts the need for “provider-backed response integrity,” but does not specify a concrete protocol for signing/verifying tool calls across:
- streaming outputs (SSE/chunked responses),
- multi-hop router chains with permissible transformations,
- JSON canonicalization across providers, and
- cross-provider key management, rotation, and revocation.
- Action: Design and evaluate a provable, backward-compatible message-level integrity scheme (e.g., HPKE/MLS-based signing with canonicalization and per-chunk Merkle trees), including performance and deployment costs.
- Chain-of-custody across multiple routers: No mechanism is proposed to preserve provenance when multiple routers legitimately normalize or transform payloads. Action: Develop a per-hop signature/envelope (e.g., verifiable transforms or a transparency-chain) that allows clients to validate an intact lineage and detect tampering at any hop.
- Request-path protections for AC-2: Defenses center on response integrity, but AC-2 (secret exfiltration) occurs on the request path before provider action. Action: Investigate client-to-provider message-level encryption of sensitive fields, token binding/PoP tokens, mutually authenticated channels, confidential-computing-assisted routing, and selective disclosure to prevent plaintext credential exposure at routers.
- Scope and representativeness of measurement: The corpus is dominated by Chinese gray-market routers and open-source templates; enterprise-managed routers (e.g., Bedrock, Azure OpenAI, OpenRouter) are not evaluated. Action: Expand measurements to managed/enterprise routers and other regions to assess prevalence and differences in abuse, policies, and defenses.
- Longitudinal dynamics and attacker adaptation: Measurements appear cross-sectional; there is no analysis of how attackers adapt to disclosure or defenses over time. Action: Conduct multi-month longitudinal studies to track evasion evolution (e.g., warm-up thresholds, target selection) and the durability of mitigation efficacy.
- Quantifying real-world chain depth/topology: The paper motivates multi-hop chains but does not quantify average hop counts, topology patterns, or path variability for typical users. Action: Build client-side path discovery (e.g., timing/JA3 correlations, content fingerprinting, signed receipts) and measure hop counts and common chain archetypes.
- Request-side manipulation beyond credential theft: Only response-side payload rewriting (AC-1) is operationalized; attack vectors that modify tool definitions, system prompts, or model/parameter selection on the request path are excluded. Action: Analyze and measure how routers could steer the model towards riskier tool calls by altering prompts/tool schemas or by subtle request-side normalization.
- Modality and tool-type coverage: Evaluation centers on shell/package-install scenarios; other tool classes (database queries, cloud API calls, code execution sandboxes, file I/O, browser automation) and non-text modalities are not systematically tested. Action: Extend the attack/defense evaluation to diverse tools and modalities, including structured DB queries, cloud resource orchestration, and multimodal function-calling.
- Framework generalizability: “Mine” is tested on four public agent frameworks; coverage of common stacks (LangChain, LlamaIndex, AutoGen, OpenAI Assistants API, enterprise RAG pipelines) is unspecified. Action: Replicate experiments across major frameworks and custom in-house agents to assess portability of attacks and defenses.
- YOLO-mode prevalence and user behavior: The paper reports YOLO mode in observed sessions but does not quantify its prevalence across frameworks or default settings and user-config behavior. Action: Survey and instrument real deployments to quantify the rate of auto-approval, default configurations, and friction introduced by stricter policies.
- Evasion beyond AC-1.a and AC-1.b: Only two evasion variants are formalized; realistic adversaries can use additional tactics (e.g., randomized low-rate injection, base64/binary-embedded arguments, steganographic changes, nested serialization). Action: Systematically enumerate and test richer evasion strategies and assess defense robustness under adaptive adversaries.
- Streaming-specific attack/detection gaps: Manipulation of streamed responses (out-of-order chunks, late-stage tool-call edits, chunk boundary tricks) is not studied. Action: Build detectors and integrity checks tailored to streaming (e.g., per-chunk signatures, reassembly verification) and evaluate latency/overhead impacts.
- Anomaly-screening limits and false-negative analysis: The paper reports 89% detection for AC-1 but lacks a breakdown for AC-1.a/AC-1.b or adaptive/low-perturbation injections, and does not quantify false negatives under adversarial mimicry or concept drift. Action: Provide per-variant ROC curves, adversarial red-teaming, and drift-evaluation on live traffic to characterize failure modes and improve recall.
- Policy-gate design clarity and usability: It is unclear which exact rules enabled the fail-closed gate to block both AC-1 and AC-1.a at ~1% FP, especially given earlier claims that dependency substitution can evade domain-based allowlists. Action: Specify gate rules, datasets, and workloads; evaluate developer burden, breakage rates, latency overhead, and bypasses (e.g., homograph/Unicode, registry mirrors).
- Transparency logging guarantees: “Append-only transparency logging” is mentioned without specifying tamper-resistance, auditor roles, log consistency proofs, or resistance to router/provider collusion. Action: Propose a verifiable logging design (e.g., CT-style Merkle trees with independent monitors) and evaluate deployment feasibility and privacy risks.
- Package-ecosystem defenses for AC-1.a: The paper highlights dependency substitution but does not evaluate supply-chain protections (e.g., Sigstore attestations, registry publisher verification, namespace pinning, SBOM/purl pinning, hash pinning). Action: Test how package-signing/provenance and lockfile/hash enforcement mitigate AC-1.a across ecosystems (PyPI, npm, crates.io, Go modules).
- Secret exposure minimization: There is no evaluation of client-side DLP, on-host secret redaction, or scoped/ephemeral credentials (e.g., short-lived STS tokens, audience-restricted OAuth/OIDC tokens) to reduce AC-2 blast radius. Action: Measure the effectiveness and usability of scoped/ephemeral credentials and local DLP on agent traffic routed via intermediaries.
- Post-compromise forensics and attribution: The paper does not explore how clients can detect prior router tampering or credential theft after the fact. Action: Develop client-side evidence capture (signed receipts, deterministic re-execution), router-agnostic proofs, and incident-response playbooks for agent supply-chain compromise.
- Ethical, reproducibility, and data-release constraints: Details on IRB/ethics review, reproducible artifacts, and sanitized datasets are sparse; limited raw data hinders replication. Action: Release redacted datasets, signatures of malicious payloads, code for Mine and detectors, and ethics protocols to enable independent validation.
- Economics and incentive structures: The gray-market router economics and operator incentives are not modeled; it is unclear which interventions would shift behavior. Action: Analyze attacker/defender cost models and evaluate economic levers (e.g., rate-limiting, abuse-resistant pricing, bounty/incentive schemes for routers) and their effects.
- Router identity and attestation: The work does not address how clients can authenticate a router’s code/configuration (e.g., TEEs, remote attestation, signed builds) or choose trustworthy routers. Action: Prototype verifiable router deployments (attested binaries, reproducible builds) and assess real-world practicality and trust bootstrapping.
- Interaction with enterprise proxies and security stacks: Interplay between LLM routers and existing corporate proxies, CASBs, and DLP is not examined. Action: Evaluate how current enterprise network controls affect (or fail to mitigate) AC-1/AC-2 and how to integrate agent-aware policies.
- Beyond LLMs: Generalization to other AI agents (vision, speech, planning) that emit structured tool actions is not explored. Action: Assess whether analogous router-in-the-middle risks and defenses apply to non-text agents and multi-agent systems.
Practical Applications
Below are practical, real-world applications derived from the paper’s findings, methods, and innovations. Each item names the sector(s), suggests concrete tools/products/workflows, and notes assumptions/dependencies that affect feasibility.
Immediate Applications
- Deploy client-side defenses for agent tool execution
- Sectors: software, finance, healthcare, government, education, robotics
- Tools/products/workflows:
- Policy gate to fail-closed on tool-call arguments (e.g., allowlist/denylist for curl|bash, package installers, network egress)
- Response-side anomaly screening (e.g., detect pipe-to-bash, suspicious URLs, command mutations, typosquat package names)
- Append-only transparency logging for all tool calls and router-returned payloads
- Integrations: plugins for LangChain, AutoGen, CrewAI, OpenDevin, Copilot-based agents; wrappers in SDKs (Python/JS)
- Assumptions/dependencies: ability to modify agent runtime; acceptable small false-positive rate (~1% per paper); access to tool schemas; logging retention and privacy controls
- Introduce “Mine”-style red teaming and validation for agent supply chains
- Sectors: software security, regulated industries (finance/health), MSSPs
- Tools/products/workflows:
- A test proxy that simulates AC-1/AC-1.a/AC-1.b/AC-2 to evaluate organizational agents and defenses
- CI/CD gates that replay recorded sessions through a benign proxy and compare to production to detect tampering
- Assumptions/dependencies: safe sandboxes to execute shell/tool actions; legal/ethical approvals for internal red teaming; no provider cooperation needed
- Hardening organizational policies against multi-hop router risk
- Sectors: enterprise IT, procurement, legal/compliance
- Tools/products/workflows:
- Procurement checklists that ban gray-market/resold keys and limit router hops; require transparency logs and breach notifications
- Configuration baselines: disable “YOLO mode” (auto-approve tools), constrain Bash/PowerShell tools, allowlist router endpoints
- Assumptions/dependencies: centralized governance over agent configurations; vendor willingness to provide audit artifacts
- Secrets minimization and honeytoken detection for AC-2
- Sectors: finance, healthcare, SaaS, cloud operations
- Tools/products/workflows:
- Avoid sending long-lived secrets through routers; use short-lived, scoped tokens; rotate keys frequently
- Deploy canary credentials (AWS/GitHub/Slack/Ethereum) in agent contexts to detect passive exfiltration; wire alerts to SIEM/SOAR
- Assumptions/dependencies: vault and tokenization infrastructure; canary telemetry and alerting; staff trained to respond and rotate
- Endpoint and SOC monitoring tuned for agent tool misuse
- Sectors: SOC/EDR vendors, large enterprises
- Tools/products/workflows:
- EDR rules for agent-run shells (e.g., curl | bash, wget | sh, pip/npm/cargo install anomalies, post-install scripts)
- Process-tree and network egress analytics linked to transparency logs to flag divergences
- Assumptions/dependencies: telemetry collection from agent hosts; ability to correlate logs with agent IDs and sessions
- Secure defaults for agent frameworks and IDEs
- Sectors: software development tooling, education
- Tools/products/workflows:
- Ship frameworks with tool execution off by default; require explicit approval or policy gate config
- Built-in typosquat detection for package installs; warn on piping remote scripts to interpreters
- Assumptions/dependencies: maintainers adopt secure defaults; developer education and UX for approvals
- Incident response and playbooks for router-in-the-middle compromises
- Sectors: enterprise IT/security, MSPs
- Tools/products/workflows:
- Playbooks to: switch to direct provider endpoints, disable tool execution, invalidate keys, query transparency logs for scope, rebuild environments
- Assumptions/dependencies: key rotation agility; alternative provider/routing paths available; logging completeness
- Market and community monitoring for rogue routers and leaked keys
- Sectors: threat intel, policy enforcement, platform trust/safety
- Tools/products/workflows:
- Monitor Taobao/Xianyu/Telegram/WeChat for resold keys and router endpoints; takedown and user notification processes
- Assumptions/dependencies: legal authority for takedowns; partnerships with marketplaces and platforms
- Safer configurations for managed cloud routing
- Sectors: cloud (Bedrock, Azure OpenAI), enterprise users
- Tools/products/workflows:
- Prefer first-party managed routers; enforce single-hop; mutual TLS and IP allowlists for router endpoints
- Assumptions/dependencies: enterprise can select routing topology; cloud providers expose necessary controls
- Curriculum and labs on LLM supply-chain threats
- Sectors: academia, workforce upskilling
- Tools/products/workflows:
- Course modules and hands-on labs using “Mine”-like proxies to illustrate AC-1/AC-2 and client-side mitigations
- Assumptions/dependencies: sandboxed environments for safe execution; institutional approval
- Consumer/daily-life developer guidance for AI coding agents
- Sectors: individual developers, startups
- Tools/products/workflows:
- Disable auto-approve tool execution; avoid piping to shell; verify package names; prefer direct provider endpoints; do not share or reuse API keys from unknown sources
- Assumptions/dependencies: access to agent settings; user willingness to trade convenience for safety
Long-Term Applications
- Provider-backed end-to-end integrity for tool-call responses
- Sectors: model providers, agent frameworks, standards bodies
- Tools/products/workflows:
- Providers sign tool-call arguments (e.g., COSE/JWS) so clients verify that router-delivered payloads match the upstream output
- Optionally encrypt arguments end-to-end (HPKE/OHTTP/MLS) so routers cannot read/modify them
- Assumptions/dependencies: cross-vendor standardization; key distribution to clients; performance and privacy trade-offs; backward compatibility
- Verifiable multi-hop provenance and transparency
- Sectors: router vendors, auditors, regulators
- Tools/products/workflows:
- Chained, append-only receipts/logs (CT-like) where each hop adds a signature and immutable record; clients/auditors validate path integrity
- Assumptions/dependencies: ecosystem-wide adoption; log operator governance; handling of sensitive content in logs
- Capability-based and attested tool execution
- Sectors: OS vendors, robotics/industrial control, enterprise IT
- Tools/products/workflows:
- OS-level “agent broker” that enforces least-privilege capabilities for tools (network egress, filesystem write, package install) and records signed execution manifests
- Hardware/TEE attestation to prove non-tampering of tool payloads
- Assumptions/dependencies: changes to OS/tooling; performance overhead; hardware support; developer migration
- Secure-by-design router architectures
- Sectors: router vendors, cloud platforms
- Tools/products/workflows:
- “Blind” or enclave-backed routers that cannot access plaintext tool calls; differential privacy or policy-only metadata routing
- Assumptions/dependencies: TEEs, side-channel mitigations, cost/performance; compatibility with provider APIs
- Anti-typosquat and package-supply-chain safeguards integrated into agents
- Sectors: software supply chain, package registries (PyPI, npm), IDEs
- Tools/products/workflows:
- Real-time package signing/verification (e.g., Sigstore/TUF) and typosquat risk scoring inside agent approval UIs
- Assumptions/dependencies: registry cooperation; adoption of signing infrastructure; agent UI/UX updates
- Out-of-band validation and multi-path consensus
- Sectors: high-assurance deployments (finance/critical infrastructure)
- Tools/products/workflows:
- Clients cross-check tool-call responses via direct, read-only queries to providers or via diverse routers; execute only if hashes match a quorum
- Assumptions/dependencies: increased latency/cost; providers expose verifiable transcripts; resistance to correlated compromise
- Standards and certification for LLM routers and agent supply chains
- Sectors: policy/regulatory, industry consortia
- Tools/products/workflows:
- SOC 2-/ISO-like profiles for routers (key handling, logging, disclosure); labeling rules for resold keys; breach liability clauses
- Assumptions/dependencies: regulatory mandates or market pressure; auditing capacity; global harmonization
- Insurance and risk quantification for LLM supply-chain exposure
- Sectors: insurance, risk management, finance
- Tools/products/workflows:
- Actuarial models for router/agent risk; premium discounts for using response signing, single-hop routing, and transparency logs
- Assumptions/dependencies: sufficient incident data; standardized controls and telemetry
- Provider-side secret minimization and capability tokens
- Sectors: model providers, API platforms
- Tools/products/workflows:
- Replace plaintext secrets with short-lived, scoped capability tokens issued per-session/tool; provider-mediated secret stores accessible via attestations
- Assumptions/dependencies: platform changes; developer migration; token issuance infra and revocation
- Domain-specific secure agent modes
- Sectors: healthcare, finance, energy, robotics
- Tools/products/workflows:
- Predefined secure modes that disallow installers and network writes; pre-approved tool libraries; sector-specific allowlists (e.g., medical databases, market data)
- Assumptions/dependencies: sector standards; curated tool catalogs; user acceptance of reduced autonomy
- Academic benchmarks and shared corpora for router threats
- Sectors: academia, open-source
- Tools/products/workflows:
- Standardized datasets and evaluation suites for AC-1/AC-2/AC-1.a/AC-1.b and defenses; community-maintained test routers for reproducible research
- Assumptions/dependencies: ethical review; safe sandboxes; funding and maintenance
- Law and enforcement against gray-market router ecosystems
- Sectors: policy, law enforcement, platforms/marketplaces
- Tools/products/workflows:
- Prohibit sale of resold API access; platform policies and automated detection for suspicious API storefronts; cross-border cooperation
- Assumptions/dependencies: legal frameworks; evidence collection; cooperation from marketplaces and payment processors
- Commercialization of “Mine” into a managed security service
- Sectors: security vendors, MSPs
- Tools/products/workflows:
- Managed “agent supply-chain posture management” offering: continuous testing, policy tuning, anomaly model updates, incident forensics
- Assumptions/dependencies: sustained threat research; customer integration with varied agent stacks; SLAs and liability terms
These applications collectively enable organizations and individuals to reduce exposure to malicious intermediary attacks today, while charting a path toward cryptographic, standards-based defenses that close the trust gap identified in the paper.
Glossary
- AC-1: Core attack class for response-side payload injection that rewrites tool-call arguments before the client executes them. "payload injection (AC-1)"
- AC-1.a: Adaptive evasion variant that targets package-install commands by substituting dependency names to plant malicious packages. "dependency-targeted injection (AC-1.a)"
- AC-1.b: Adaptive evasion variant that delivers malicious rewrites only under certain session or content conditions. "conditional delivery (AC-1.b)"
- AC-2: Core attack class for passive secret exfiltration that silently extracts credentials from plaintext request/response traffic. "secret exfiltration (AC-2)"
- Adaptive Evasion: Strategies that modulate when and how malicious payloads are delivered to evade detection and audits. "Adaptive Evasion"
- append-only transparency logging: A client-side defense that records responses in an immutable log for post-hoc auditing. "append-only transparency logging."
- application-layer man-in-the-middle: A position where the intermediary terminates client TLS and can read/modify application data. "application-layer man-in-the-middle position"
- application-layer proxies: Intermediaries that operate at the application layer with plaintext access to payloads. "application-layer proxies"
- black-box auditing: Testing a system without internal visibility, which can miss attacks that activate conditionally. "black-box auditing"
- canary credentials: Instrumented credentials used to detect exposure or misuse by observing follow-on activity. "AWS canary credentials"
- certificate forgery: The act of creating or using a fraudulent certificate to impersonate a service; noted as unnecessary in this architecture. "certificate forgery"
- command-injectable: A session or workflow that exposes a shell-execution path where commands can be altered before execution. "command-injectable"
- conditional delivery: Selective activation of payload injection based on triggers such as tool names, keywords, user mode, or warm-up counts. "conditional delivery (AC-1.b)"
- credential plane: A unified layer for managing and using credentials across multiple providers. "credential plane"
- dependency confusion: A supply-chain attack where a resolver selects an attacker-controlled package due to namespace or version tricks. "dependency confusion"
- dependency rewriting: Malicious modification of install commands or dependency specifications to redirect to attacker-controlled packages. "dependency rewriting"
- dependency-targeted injection: Injection focused on install commands, substituting legitimate packages with attacker-controlled names. "dependency-targeted injection (AC-1.a)"
- end-to-end integrity: A cryptographic binding that ensures the client executes exactly what the provider produced; absent in current tooling. "end-to-end integrity mechanism"
- ETH drain: Unauthorized transfer of Ether from a private key after exposure. "drains ETH"
- fail-closed policy gate: A defensive control that blocks actions unless they pass policy checks, defaulting to deny on uncertainty. "fail-closed policy gate"
- function calling: An LLM capability to return structured tool invocations with JSON arguments. "function calling"
- JA3 fingerprints: TLS client fingerprinting technique based on TLS handshake parameters, used to identify clients. "6 JA3 fingerprints"
- man-in-the-middle (MITM): An adversary intercepting communications; here, the router acts as a deliberate MITM at the application layer. "man-in-the-middle"
- model substitution: Replacing an intended model with another to alter behavior; excluded from this threat model. "model substitution"
- multi-hop router chain: A sequence of routers between client and provider, each terminating TLS and handling plaintext. "Multi-hop LLM Router Chain"
- prompt injection: Attacks that manipulate a model via crafted prompt content to induce undesired tool calls or outputs. "prompt injection"
- provider-backed response integrity: A proposed mechanism where providers cryptographically bind tool-call outputs to what clients receive. "provider-backed response integrity"
- response-side anomaly screening: A client-side defense that detects suspicious or out-of-pattern responses before execution. "response-side anomaly screening"
- response-side payload rewriting: The act of modifying tool-call arguments after the provider response but before client execution. "response-side payload rewriting"
- router trust boundary: The security boundary created by placing a router between client and provider, with full plaintext visibility. "router trust boundary"
- router-in-the-middle: A deliberately configured intermediary that stands between client and provider with MITM capabilities. "router-in-the-middle"
- sandboxed agent environment: An isolated execution setup used to safely run and monitor agent tool calls during testing. "sandboxed agent environment"
- supply-chain compromise: An attack via dependencies or distribution channels that compromises widely deployed software components. "supply-chain compromise"
- supply-chain trust boundary: The trust boundary spanning routers and providers in the agent ecosystem’s supply chain. "supply-chain trust boundary"
- taint propagation: The spread of corrupted or malicious data through system components following an initial injection. "taint propagation."
- TLS downgrade: Forcing a weaker transport security mode to enable interception; unnecessary when routers are explicitly configured. "TLS downgrade"
- tool-call semantics: The executable meaning carried by tool-call payloads that determine concrete actions on the client. "tool-call semantics."
- tool use: First-class API capability for models to invoke external tools with structured arguments. "tool use (also called function calling)"
- trigger predicate: A condition over request/session features that determines when a router activates payload injection. "trigger predicate"
- typosquatting: Registering visually similar package names to trick installers into fetching attacker-controlled dependencies. "typosquatting"
- weak relay chains: Router chains that forward traffic through less secure intermediaries, expanding the attack surface. "weak relay chains"
- weakest-link property: The property that security of a multi-hop chain is compromised by any single malicious or compromised router. "weakest-link property"
- YOLO mode: Autonomous mode where agents auto-approve and execute tool calls without manual confirmation. "YOLO mode"
Collections
Sign up for free to add this paper to one or more collections.





