Papers
Topics
Authors
Recent
Search
2000 character limit reached

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

Published 9 Apr 2026 in cs.CR | (2604.08407v1)

Abstract: LLM agents increasingly rely on third-party API routers to dispatch tool-calling requests across multiple upstream providers. These routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload, yet no provider enforces cryptographic integrity between client and upstream model. We present the first systematic study of this attack surface. We formalize a threat model for malicious LLM API routers and define two core attack classes, payload injection (AC-1) and secret exfiltration (AC-2), together with two adaptive evasion variants: dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b). Across 28 paid routers purchased from Taobao, Xianyu, and Shopify-hosted storefronts and 400 free routers collected from public communities, we find 1 paid and 8 free routers actively injecting malicious code, 2 deploying adaptive evasion triggers, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. Two poisoning studies further show that ostensibly benign routers can be pulled into the same attack surface: a leaked OpenAI key generates 100M GPT-5.4 tokens and more than seven Codex sessions, while weakly configured decoys yield 2B billed tokens, 99 credentials across 440 Codex sessions, and 401 sessions already running in autonomous YOLO mode. We build Mine, a research proxy that implements all four attack classes against four public agent frameworks, and use it to evaluate three deployable client-side defenses: a fail-closed policy gate, response-side anomaly screening, and append-only transparency logging.

Summary

  • The paper identifies and formalizes two primary attack classes—payload injection and secret exfiltration—by empirically analyzing 28 paid and 400 free routers.
  • It demonstrates universal compatibility of attacks via a custom proxy and controlled poisoning studies, highlighting real-world adverse effects like credential theft and message tampering.
  • The study evaluates client-side defenses and argues for provider-signed canonical responses to ensure end-to-end integrity in LLM supply chains.

Malicious Intermediary Attacks in LLM Supply Chains: A Technical Analysis


Introduction

Modern LLM-powered autonomous agents are increasingly dependent on third-party API routers that multiplex, balance, and optimize access to upstream proprietary models. This intermediary layer, implemented in widely deployed software like LiteLLM and OpenRouter, is voluntarily positioned by clients between themselves and model providers, fundamentally altering trust boundaries for both request dispatch and tool-execution semantics. The paper "Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain" (2604.08407) presents the first comprehensive analysis of this architectural vulnerability, formalizing the exposure, characterizing real-world adversarial activity, and evaluating deployable mitigation strategies.


Threat Model and Attack Taxonomy

The explicit configuration of routers as API endpoints grants these intermediaries plaintext access to all transiting data—including credentials, tool arguments, and model outputs—by design. The paper formalizes this as an application-layer MITM with direct authority over tool-calling JSON. The integrity gap stems from the absence of any deployed mechanism that cryptographically ties upstream provider responses to what the client ultimately receives.

Two orthogonal core attack classes emerge:

  • AC-1 (Payload Injection): Active modification of model responses, permitting arbitrary rewrite of tool-call arguments prior to client delivery.
  • AC-2 (Secret Exfiltration): Passive extraction of credentials and sensitive secrets from unencrypted traffic.

Two adaptive evasion variants are systematized:

  • AC-1.a (Dependency-Targeted Injection): Modifies only dependency/package names to deploy malicious libraries while passing syntactic and domain-based checks.
  • AC-1.b (Conditional Delivery): Payload injection is triggered only on attacker-controlled predicates (e.g., session fingerprints, request history, project types), thereby evading standard auditing and detection.

Empirical Measurement of the Ecosystem

The study analyzes 28 paid routers acquired from leading black/gray-market vendors and 400 free routers exposed in public communities, primarily based on a handful of dominant open-source templates. Malicious behavior is directly observed in both economic segments:

  • 1 paid and 8 free routers perform active code injection (AC-1/AC-1.a).
  • 2 routers conditionally trigger adaptive evasion (AC-1.b).
  • 17 routers exfiltrate AWS credentials, and 1 router steals Ethereum wallet funds—demonstrating AC-2 at scale.
  • Poisoning studies show that benign routers are routinely incorporated into relay chains via leaked credentials; a single deliberately leaked OpenAI key invoked 100M model tokens, with 440 downstream Codex sessions processed by weak relays and 401 sessions running in fully autonomous (YOLO) mode.

This measurement underlines two fundamental points: (a) supply-chain entry points for such attacks are prevalent in real markets, and (b) the weakest-link property causes benign routers to be as dangerous as actively malicious ones when trust boundaries are transitive. Figure 1

Figure 1

Figure 1

Figure 1

Figure 1

Figure 1

Figure 1

Figure 1: The LLM router ecosystem depicts how a single malicious intermediary (R4R_4) can taint and control downstream agent behavior through application-layer payload modification, in contrast to clean paths.


Attack Surface and Traceability

The attacks act below the model abstraction boundary, altering JSON payloads without engaging with prompt text or model logic. In tool-use workflows (e.g., OpenAI’s function calling, Anthropic’s native block JSON), payloads such as shell execution commands, installer URLs, and package install invocations are re-writable targets. The absence of transport or application-level integrity assurances means AC-1 can deliver arbitrary code execution, and AC-1.a can enable persistent supply-chain compromise by swapping package names to pre-poisoned dependencies. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: Request–response lifecycle through a malicious router showing passive (secret scan, AC-2) and active (payload injection, AC-1) attack points on both request and response paths.

Empirical adaptation is confirmed: observed adversaries selectively inject for high-value sessions or after warm-up epochs, effectively hiding from finite black-box audits.


Benign Router Poisoning and Multi-Hop Exploitation

Two controlled poisoning studies demonstrate that credential leakage and weak relay configuration are widespread and immediately operationalized by attackers. Once an upstream key is leaked or a relay is left exposed, subsequent sessions—regardless of the router’s benign intent—are forwarded in cleartext through potential attack chains, inheriting the full attack surface of explicit adversaries. The paper documents 2B model tokens, 99 credentials, and high rates of session-level command injection through such poisoned relay networks.


Implementation and Compatibility of Attacks

The research proxy, Mine, implements all discussed attack classes and demonstrates universal compatibility against four contemporary agent frameworks: OpenClaw, OpenCode, Anthropic’s Claude Code, and OpenAI’s Codex. None of these frameworks enforce response-integrity checks or provenance, validating the efficacy and portability of these techniques. Mine achieves 100% compatibility for AC-1 and AC-2, and nearly 100% for AC-1.a, without introducing perceptible latency—buffered streaming incurs sub-ms additional latency compared to routine model jitter.


Defenses: Practical Efficacy and Limitations

Three deployable client-side mitigations are evaluated:

  • Fail-Closed Policy Gate: Blocks all AC-1/1.a samples with a mere 1% false positive rate but can be bypassed by allowlist-compliant attacks.
  • Response-Side Anomaly Screening: Catches up to 89% of AC-1 samples with moderate (6.7%) false positive rates but is less effective against sophisticated and evasion-aware attackers.
  • Append-Only Transparency Logging: Facilitates forensic auditing, correlating compromised responses to incident scope but does not prevent attacks.

None of these achieve cryptographic origin authentication or end-to-end provenance.


Implications and Long-Term Directions

The analysis reveals an architectural misalignment between current LLM deployment practice and system trust assumptions. In practice, router-terminated TLS segments break the possibility of transparent, secure delegation. Robust defenses must shift responsibility to upstream model providers via provider-signed, canonical response envelopes covering tool-calling semantics and client-nonce binding, analogous to DKIM for email authenticity but adapted for structured tool arguments. This is necessary for meaningful end-to-end integrity across multi-hop, multi-vendor, and open-infrastructure ecosystems.

The study demonstrates that present defenses are operationally useful but strictly palliative—they bound exposure from unsophisticated attacks but cannot guarantee correctness in an adversarial supply chain.


Conclusion

In sum, commodity LLM routers have been empirically confirmed as active threat vectors for both payload tampering and credential harvesting. The voluntary placement of routers within agent execution pipelines creates an application-layer trust boundary that is not protected by existing cryptographic mechanisms. While immediate client-side mitigations reduce exposure, they are fundamentally insufficient. Secure agent-mediated tool execution will require adoption of provider-signed canonical response formats and stricter session provenance in LLM deployment APIs. These observations carry both practical urgency for security operations and theoretical significance for the architecture of multi-agent and tool-augmented AI systems.


Whiteboard

Explain it Like I'm 14

What is this paper about?

This paper looks at a hidden risk in how many AI “agent” apps work. Today, lots of apps that use LLMs don’t talk to model companies (like OpenAI or Anthropic) directly. Instead, they send their requests through “API routers” — middlemen that forward messages to different models and send the answers back. The problem: these routers can see and change everything in those messages, and there’s no end-to-end way to prove that what the app receives is exactly what the model produced. The authors show how bad actors can abuse this and measure how often it already happens in the wild.

What questions did the researchers ask?

The team focused on simple, practical questions:

  • Are these middleman routers a real weak point where attackers can hijack AI agents?
  • What kinds of attacks are possible when a router can read and edit messages?
  • How common are these attacks in real-life router services people can buy or find online?
  • Can “normal-looking” routers be pulled into the same problem if they reuse leaked keys or forward traffic through other weak routers?
  • What can developers do today to protect themselves while we wait for better, built-in protections from model providers?

Key terms in everyday language

  • API router: Think of a travel agent for AI requests. You give it your plan (the “prompt”), and it decides which airline (model provider) to use. It can see and edit your plan.
  • Tool call: Many agents don’t just chat; they run tools (like terminal commands, code, database queries). A tool call is the instruction the model gives the app to execute, usually sent as a neat form called JSON (like a filled-out form with specific fields).
  • JSON: A simple, text-based format that looks like a clear checklist of fields and values.
  • Credentials/API keys: Passwords for services (for example, a secret string that lets your app access a cloud account).
  • Man-in-the-middle: Someone sitting between you and who you’re talking to, able to read and change messages. Here, the router is that “middle,” by design.
  • “YOLO mode”: An agent setting where tool actions are auto-approved without asking a human each time.

How did they study it?

To keep things fair and realistic, the researchers:

  • Mapped the attack types:
    • Changing tool instructions on the way back to the app (payload injection).
    • Quietly collecting secrets that pass through (secret exfiltration).
    • Two “sneaky” versions: swapping package names during installs (dependency-targeted) and only attacking in certain situations to avoid detection (conditional delivery).
  • Measured the ecosystem:
    • Bought 28 paid router services from online marketplaces.
    • Collected 400 free routers shared in public communities.
    • Sent safe, test requests through them using sandboxed (isolated) agent setups.
  • Ran “poisoning” experiments to see how routers get pulled into risky chains:
    • They intentionally leaked a researcher-owned OpenAI API key in public groups to see how it would get reused downstream.
    • They deployed decoy routers with weak settings to observe how others would grab and reuse them.
  • Built a research proxy called “Mine” to simulate the attacks and test client-side defenses:
    • A fail-closed policy gate (block unexpected or unsafe command destinations).
    • Response-side anomaly screening (flag unusual tool-call content).
    • Append-only transparency logs (keep a tamper-evident record of what was received).

Analogy: Imagine mailing a signed instruction list to a contractor, but instead of sending it directly, you route it through several mailrooms. Any mailroom can open, rewrite, or copy your instructions, and you have no signed receipt proving what the contractor actually got. The team mailed test letters through real mailrooms, set up honeypots to catch snoopers, and tried locks and seals to see what protects best.

What did they find, and why does it matter?

Main findings from the real-world routers they tested:

  • Active abuse exists today:
    • 1 paid and 8 free routers were caught injecting malicious code into tool instructions.
    • 2 routers used “smart” evasion: they attacked only after warm-up periods, only on certain languages (like Rust/Go), or only when they sensed “YOLO mode.”
    • 17 routers touched researcher-owned AWS “canary” credentials that were planted to detect theft, and 1 router drained funds from a researcher-owned Ethereum private key.
  • “Benign” routers can become dangerous via the supply chain:
    • A single intentionally leaked OpenAI key led to 100 million model tokens being used and spawned coding sessions that could be hijacked.
    • Weakly configured decoy routers were quickly absorbed into routing chains and processed about 2.1 billion tokens, exposing 99 credentials across 440 coding sessions.
    • 401 of those sessions were already in “YOLO mode,” meaning a simple tool-instruction tweak would run instantly without user approval.
  • Attack types that stood out:
    • Payload injection: silently changing a tool command like “install this” so it fetches an attacker’s script or installs a lookalike package (e.g., a typo like “reqeusts” instead of “requests”) that contains malware.
    • Secret exfiltration: scanning passing messages for patterns that look like passwords or keys (e.g., AWS keys, GitHub tokens) and copying them—without changing anything the app sees.
  • Defenses they tested:
    • A strict “fail-closed” policy gate stopped all simple command rewrites they tested, with low false alarms (~1%).
    • Anomaly screening flagged most malicious changes (about 89%) even without help from model providers.
    • Transparency logs improved visibility after the fact.
    • However, these are stopgaps. Because routers terminate encryption and there’s no end-to-end signature tying the model’s original tool call to what the client executes, a clever attacker can still slip through—especially with conditional delivery.

Why it matters: Many agent apps auto-run commands. If a router changes “download this safe tool” into “download my malware,” the agent can get hijacked. And because routers can also quietly collect secrets, a single leak can spread across many downstream systems.

What does this mean going forward?

  • For developers and teams:
    • Be cautious about routing through unknown or chain-heavy routers, especially in regions or marketplaces where resold access is common.
    • Avoid “YOLO mode” for sensitive actions; require human approval or strict policy gates for installs and shell commands.
    • Consider client-side checks now: allowlists for known-safe domains, anomaly screening, and immutable logs.
    • Rotate and scope API keys; assume anything that passes through a router can be read.
  • For model providers and platforms:
    • The long-term fix is end-to-end integrity: cryptographic signatures that let clients verify “this tool call truly came from the model and wasn’t altered.” Without this, any hop in the chain can tamper undetected.
  • Bigger picture:
    • The “AI supply chain” now includes human-readable, executable instructions (tool calls). Middlemen that used to be harmless routers become high-risk chokepoints. Securing this chain protects not just chat quality, but the machines and accounts agents control.

In short: The paper shows that “your agent is mine” can be a real outcome if you trust the wrong router. Some routers already inject code or steal secrets, and even good-looking ones can be pulled into risky chains. There are practical defenses you can use today, but the ecosystem ultimately needs cryptographic guarantees so that what the model outputs is exactly what your agent runs—no silent rewrites in the middle.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a concise, actionable list of what remains missing, uncertain, or unexplored in the paper.

  • End-to-end integrity mechanism design: The paper asserts the need for “provider-backed response integrity,” but does not specify a concrete protocol for signing/verifying tool calls across:
    • streaming outputs (SSE/chunked responses),
    • multi-hop router chains with permissible transformations,
    • JSON canonicalization across providers, and
    • cross-provider key management, rotation, and revocation.
    • Action: Design and evaluate a provable, backward-compatible message-level integrity scheme (e.g., HPKE/MLS-based signing with canonicalization and per-chunk Merkle trees), including performance and deployment costs.
  • Chain-of-custody across multiple routers: No mechanism is proposed to preserve provenance when multiple routers legitimately normalize or transform payloads. Action: Develop a per-hop signature/envelope (e.g., verifiable transforms or a transparency-chain) that allows clients to validate an intact lineage and detect tampering at any hop.
  • Request-path protections for AC-2: Defenses center on response integrity, but AC-2 (secret exfiltration) occurs on the request path before provider action. Action: Investigate client-to-provider message-level encryption of sensitive fields, token binding/PoP tokens, mutually authenticated channels, confidential-computing-assisted routing, and selective disclosure to prevent plaintext credential exposure at routers.
  • Scope and representativeness of measurement: The corpus is dominated by Chinese gray-market routers and open-source templates; enterprise-managed routers (e.g., Bedrock, Azure OpenAI, OpenRouter) are not evaluated. Action: Expand measurements to managed/enterprise routers and other regions to assess prevalence and differences in abuse, policies, and defenses.
  • Longitudinal dynamics and attacker adaptation: Measurements appear cross-sectional; there is no analysis of how attackers adapt to disclosure or defenses over time. Action: Conduct multi-month longitudinal studies to track evasion evolution (e.g., warm-up thresholds, target selection) and the durability of mitigation efficacy.
  • Quantifying real-world chain depth/topology: The paper motivates multi-hop chains but does not quantify average hop counts, topology patterns, or path variability for typical users. Action: Build client-side path discovery (e.g., timing/JA3 correlations, content fingerprinting, signed receipts) and measure hop counts and common chain archetypes.
  • Request-side manipulation beyond credential theft: Only response-side payload rewriting (AC-1) is operationalized; attack vectors that modify tool definitions, system prompts, or model/parameter selection on the request path are excluded. Action: Analyze and measure how routers could steer the model towards riskier tool calls by altering prompts/tool schemas or by subtle request-side normalization.
  • Modality and tool-type coverage: Evaluation centers on shell/package-install scenarios; other tool classes (database queries, cloud API calls, code execution sandboxes, file I/O, browser automation) and non-text modalities are not systematically tested. Action: Extend the attack/defense evaluation to diverse tools and modalities, including structured DB queries, cloud resource orchestration, and multimodal function-calling.
  • Framework generalizability: “Mine” is tested on four public agent frameworks; coverage of common stacks (LangChain, LlamaIndex, AutoGen, OpenAI Assistants API, enterprise RAG pipelines) is unspecified. Action: Replicate experiments across major frameworks and custom in-house agents to assess portability of attacks and defenses.
  • YOLO-mode prevalence and user behavior: The paper reports YOLO mode in observed sessions but does not quantify its prevalence across frameworks or default settings and user-config behavior. Action: Survey and instrument real deployments to quantify the rate of auto-approval, default configurations, and friction introduced by stricter policies.
  • Evasion beyond AC-1.a and AC-1.b: Only two evasion variants are formalized; realistic adversaries can use additional tactics (e.g., randomized low-rate injection, base64/binary-embedded arguments, steganographic changes, nested serialization). Action: Systematically enumerate and test richer evasion strategies and assess defense robustness under adaptive adversaries.
  • Streaming-specific attack/detection gaps: Manipulation of streamed responses (out-of-order chunks, late-stage tool-call edits, chunk boundary tricks) is not studied. Action: Build detectors and integrity checks tailored to streaming (e.g., per-chunk signatures, reassembly verification) and evaluate latency/overhead impacts.
  • Anomaly-screening limits and false-negative analysis: The paper reports 89% detection for AC-1 but lacks a breakdown for AC-1.a/AC-1.b or adaptive/low-perturbation injections, and does not quantify false negatives under adversarial mimicry or concept drift. Action: Provide per-variant ROC curves, adversarial red-teaming, and drift-evaluation on live traffic to characterize failure modes and improve recall.
  • Policy-gate design clarity and usability: It is unclear which exact rules enabled the fail-closed gate to block both AC-1 and AC-1.a at ~1% FP, especially given earlier claims that dependency substitution can evade domain-based allowlists. Action: Specify gate rules, datasets, and workloads; evaluate developer burden, breakage rates, latency overhead, and bypasses (e.g., homograph/Unicode, registry mirrors).
  • Transparency logging guarantees: “Append-only transparency logging” is mentioned without specifying tamper-resistance, auditor roles, log consistency proofs, or resistance to router/provider collusion. Action: Propose a verifiable logging design (e.g., CT-style Merkle trees with independent monitors) and evaluate deployment feasibility and privacy risks.
  • Package-ecosystem defenses for AC-1.a: The paper highlights dependency substitution but does not evaluate supply-chain protections (e.g., Sigstore attestations, registry publisher verification, namespace pinning, SBOM/purl pinning, hash pinning). Action: Test how package-signing/provenance and lockfile/hash enforcement mitigate AC-1.a across ecosystems (PyPI, npm, crates.io, Go modules).
  • Secret exposure minimization: There is no evaluation of client-side DLP, on-host secret redaction, or scoped/ephemeral credentials (e.g., short-lived STS tokens, audience-restricted OAuth/OIDC tokens) to reduce AC-2 blast radius. Action: Measure the effectiveness and usability of scoped/ephemeral credentials and local DLP on agent traffic routed via intermediaries.
  • Post-compromise forensics and attribution: The paper does not explore how clients can detect prior router tampering or credential theft after the fact. Action: Develop client-side evidence capture (signed receipts, deterministic re-execution), router-agnostic proofs, and incident-response playbooks for agent supply-chain compromise.
  • Ethical, reproducibility, and data-release constraints: Details on IRB/ethics review, reproducible artifacts, and sanitized datasets are sparse; limited raw data hinders replication. Action: Release redacted datasets, signatures of malicious payloads, code for Mine and detectors, and ethics protocols to enable independent validation.
  • Economics and incentive structures: The gray-market router economics and operator incentives are not modeled; it is unclear which interventions would shift behavior. Action: Analyze attacker/defender cost models and evaluate economic levers (e.g., rate-limiting, abuse-resistant pricing, bounty/incentive schemes for routers) and their effects.
  • Router identity and attestation: The work does not address how clients can authenticate a router’s code/configuration (e.g., TEEs, remote attestation, signed builds) or choose trustworthy routers. Action: Prototype verifiable router deployments (attested binaries, reproducible builds) and assess real-world practicality and trust bootstrapping.
  • Interaction with enterprise proxies and security stacks: Interplay between LLM routers and existing corporate proxies, CASBs, and DLP is not examined. Action: Evaluate how current enterprise network controls affect (or fail to mitigate) AC-1/AC-2 and how to integrate agent-aware policies.
  • Beyond LLMs: Generalization to other AI agents (vision, speech, planning) that emit structured tool actions is not explored. Action: Assess whether analogous router-in-the-middle risks and defenses apply to non-text agents and multi-agent systems.

Practical Applications

Below are practical, real-world applications derived from the paper’s findings, methods, and innovations. Each item names the sector(s), suggests concrete tools/products/workflows, and notes assumptions/dependencies that affect feasibility.

Immediate Applications

  • Deploy client-side defenses for agent tool execution
    • Sectors: software, finance, healthcare, government, education, robotics
    • Tools/products/workflows:
    • Policy gate to fail-closed on tool-call arguments (e.g., allowlist/denylist for curl|bash, package installers, network egress)
    • Response-side anomaly screening (e.g., detect pipe-to-bash, suspicious URLs, command mutations, typosquat package names)
    • Append-only transparency logging for all tool calls and router-returned payloads
    • Integrations: plugins for LangChain, AutoGen, CrewAI, OpenDevin, Copilot-based agents; wrappers in SDKs (Python/JS)
    • Assumptions/dependencies: ability to modify agent runtime; acceptable small false-positive rate (~1% per paper); access to tool schemas; logging retention and privacy controls
  • Introduce “Mine”-style red teaming and validation for agent supply chains
    • Sectors: software security, regulated industries (finance/health), MSSPs
    • Tools/products/workflows:
    • A test proxy that simulates AC-1/AC-1.a/AC-1.b/AC-2 to evaluate organizational agents and defenses
    • CI/CD gates that replay recorded sessions through a benign proxy and compare to production to detect tampering
    • Assumptions/dependencies: safe sandboxes to execute shell/tool actions; legal/ethical approvals for internal red teaming; no provider cooperation needed
  • Hardening organizational policies against multi-hop router risk
    • Sectors: enterprise IT, procurement, legal/compliance
    • Tools/products/workflows:
    • Procurement checklists that ban gray-market/resold keys and limit router hops; require transparency logs and breach notifications
    • Configuration baselines: disable “YOLO mode” (auto-approve tools), constrain Bash/PowerShell tools, allowlist router endpoints
    • Assumptions/dependencies: centralized governance over agent configurations; vendor willingness to provide audit artifacts
  • Secrets minimization and honeytoken detection for AC-2
    • Sectors: finance, healthcare, SaaS, cloud operations
    • Tools/products/workflows:
    • Avoid sending long-lived secrets through routers; use short-lived, scoped tokens; rotate keys frequently
    • Deploy canary credentials (AWS/GitHub/Slack/Ethereum) in agent contexts to detect passive exfiltration; wire alerts to SIEM/SOAR
    • Assumptions/dependencies: vault and tokenization infrastructure; canary telemetry and alerting; staff trained to respond and rotate
  • Endpoint and SOC monitoring tuned for agent tool misuse
    • Sectors: SOC/EDR vendors, large enterprises
    • Tools/products/workflows:
    • EDR rules for agent-run shells (e.g., curl | bash, wget | sh, pip/npm/cargo install anomalies, post-install scripts)
    • Process-tree and network egress analytics linked to transparency logs to flag divergences
    • Assumptions/dependencies: telemetry collection from agent hosts; ability to correlate logs with agent IDs and sessions
  • Secure defaults for agent frameworks and IDEs
    • Sectors: software development tooling, education
    • Tools/products/workflows:
    • Ship frameworks with tool execution off by default; require explicit approval or policy gate config
    • Built-in typosquat detection for package installs; warn on piping remote scripts to interpreters
    • Assumptions/dependencies: maintainers adopt secure defaults; developer education and UX for approvals
  • Incident response and playbooks for router-in-the-middle compromises
    • Sectors: enterprise IT/security, MSPs
    • Tools/products/workflows:
    • Playbooks to: switch to direct provider endpoints, disable tool execution, invalidate keys, query transparency logs for scope, rebuild environments
    • Assumptions/dependencies: key rotation agility; alternative provider/routing paths available; logging completeness
  • Market and community monitoring for rogue routers and leaked keys
    • Sectors: threat intel, policy enforcement, platform trust/safety
    • Tools/products/workflows:
    • Monitor Taobao/Xianyu/Telegram/WeChat for resold keys and router endpoints; takedown and user notification processes
    • Assumptions/dependencies: legal authority for takedowns; partnerships with marketplaces and platforms
  • Safer configurations for managed cloud routing
    • Sectors: cloud (Bedrock, Azure OpenAI), enterprise users
    • Tools/products/workflows:
    • Prefer first-party managed routers; enforce single-hop; mutual TLS and IP allowlists for router endpoints
    • Assumptions/dependencies: enterprise can select routing topology; cloud providers expose necessary controls
  • Curriculum and labs on LLM supply-chain threats
    • Sectors: academia, workforce upskilling
    • Tools/products/workflows:
    • Course modules and hands-on labs using “Mine”-like proxies to illustrate AC-1/AC-2 and client-side mitigations
    • Assumptions/dependencies: sandboxed environments for safe execution; institutional approval
  • Consumer/daily-life developer guidance for AI coding agents
    • Sectors: individual developers, startups
    • Tools/products/workflows:
    • Disable auto-approve tool execution; avoid piping to shell; verify package names; prefer direct provider endpoints; do not share or reuse API keys from unknown sources
    • Assumptions/dependencies: access to agent settings; user willingness to trade convenience for safety

Long-Term Applications

  • Provider-backed end-to-end integrity for tool-call responses
    • Sectors: model providers, agent frameworks, standards bodies
    • Tools/products/workflows:
    • Providers sign tool-call arguments (e.g., COSE/JWS) so clients verify that router-delivered payloads match the upstream output
    • Optionally encrypt arguments end-to-end (HPKE/OHTTP/MLS) so routers cannot read/modify them
    • Assumptions/dependencies: cross-vendor standardization; key distribution to clients; performance and privacy trade-offs; backward compatibility
  • Verifiable multi-hop provenance and transparency
    • Sectors: router vendors, auditors, regulators
    • Tools/products/workflows:
    • Chained, append-only receipts/logs (CT-like) where each hop adds a signature and immutable record; clients/auditors validate path integrity
    • Assumptions/dependencies: ecosystem-wide adoption; log operator governance; handling of sensitive content in logs
  • Capability-based and attested tool execution
    • Sectors: OS vendors, robotics/industrial control, enterprise IT
    • Tools/products/workflows:
    • OS-level “agent broker” that enforces least-privilege capabilities for tools (network egress, filesystem write, package install) and records signed execution manifests
    • Hardware/TEE attestation to prove non-tampering of tool payloads
    • Assumptions/dependencies: changes to OS/tooling; performance overhead; hardware support; developer migration
  • Secure-by-design router architectures
    • Sectors: router vendors, cloud platforms
    • Tools/products/workflows:
    • “Blind” or enclave-backed routers that cannot access plaintext tool calls; differential privacy or policy-only metadata routing
    • Assumptions/dependencies: TEEs, side-channel mitigations, cost/performance; compatibility with provider APIs
  • Anti-typosquat and package-supply-chain safeguards integrated into agents
    • Sectors: software supply chain, package registries (PyPI, npm), IDEs
    • Tools/products/workflows:
    • Real-time package signing/verification (e.g., Sigstore/TUF) and typosquat risk scoring inside agent approval UIs
    • Assumptions/dependencies: registry cooperation; adoption of signing infrastructure; agent UI/UX updates
  • Out-of-band validation and multi-path consensus
    • Sectors: high-assurance deployments (finance/critical infrastructure)
    • Tools/products/workflows:
    • Clients cross-check tool-call responses via direct, read-only queries to providers or via diverse routers; execute only if hashes match a quorum
    • Assumptions/dependencies: increased latency/cost; providers expose verifiable transcripts; resistance to correlated compromise
  • Standards and certification for LLM routers and agent supply chains
    • Sectors: policy/regulatory, industry consortia
    • Tools/products/workflows:
    • SOC 2-/ISO-like profiles for routers (key handling, logging, disclosure); labeling rules for resold keys; breach liability clauses
    • Assumptions/dependencies: regulatory mandates or market pressure; auditing capacity; global harmonization
  • Insurance and risk quantification for LLM supply-chain exposure
    • Sectors: insurance, risk management, finance
    • Tools/products/workflows:
    • Actuarial models for router/agent risk; premium discounts for using response signing, single-hop routing, and transparency logs
    • Assumptions/dependencies: sufficient incident data; standardized controls and telemetry
  • Provider-side secret minimization and capability tokens
    • Sectors: model providers, API platforms
    • Tools/products/workflows:
    • Replace plaintext secrets with short-lived, scoped capability tokens issued per-session/tool; provider-mediated secret stores accessible via attestations
    • Assumptions/dependencies: platform changes; developer migration; token issuance infra and revocation
  • Domain-specific secure agent modes
    • Sectors: healthcare, finance, energy, robotics
    • Tools/products/workflows:
    • Predefined secure modes that disallow installers and network writes; pre-approved tool libraries; sector-specific allowlists (e.g., medical databases, market data)
    • Assumptions/dependencies: sector standards; curated tool catalogs; user acceptance of reduced autonomy
  • Academic benchmarks and shared corpora for router threats
    • Sectors: academia, open-source
    • Tools/products/workflows:
    • Standardized datasets and evaluation suites for AC-1/AC-2/AC-1.a/AC-1.b and defenses; community-maintained test routers for reproducible research
    • Assumptions/dependencies: ethical review; safe sandboxes; funding and maintenance
  • Law and enforcement against gray-market router ecosystems
    • Sectors: policy, law enforcement, platforms/marketplaces
    • Tools/products/workflows:
    • Prohibit sale of resold API access; platform policies and automated detection for suspicious API storefronts; cross-border cooperation
    • Assumptions/dependencies: legal frameworks; evidence collection; cooperation from marketplaces and payment processors
  • Commercialization of “Mine” into a managed security service
    • Sectors: security vendors, MSPs
    • Tools/products/workflows:
    • Managed “agent supply-chain posture management” offering: continuous testing, policy tuning, anomaly model updates, incident forensics
    • Assumptions/dependencies: sustained threat research; customer integration with varied agent stacks; SLAs and liability terms

These applications collectively enable organizations and individuals to reduce exposure to malicious intermediary attacks today, while charting a path toward cryptographic, standards-based defenses that close the trust gap identified in the paper.

Glossary

  • AC-1: Core attack class for response-side payload injection that rewrites tool-call arguments before the client executes them. "payload injection (AC-1)"
  • AC-1.a: Adaptive evasion variant that targets package-install commands by substituting dependency names to plant malicious packages. "dependency-targeted injection (AC-1.a)"
  • AC-1.b: Adaptive evasion variant that delivers malicious rewrites only under certain session or content conditions. "conditional delivery (AC-1.b)"
  • AC-2: Core attack class for passive secret exfiltration that silently extracts credentials from plaintext request/response traffic. "secret exfiltration (AC-2)"
  • Adaptive Evasion: Strategies that modulate when and how malicious payloads are delivered to evade detection and audits. "Adaptive Evasion"
  • append-only transparency logging: A client-side defense that records responses in an immutable log for post-hoc auditing. "append-only transparency logging."
  • application-layer man-in-the-middle: A position where the intermediary terminates client TLS and can read/modify application data. "application-layer man-in-the-middle position"
  • application-layer proxies: Intermediaries that operate at the application layer with plaintext access to payloads. "application-layer proxies"
  • black-box auditing: Testing a system without internal visibility, which can miss attacks that activate conditionally. "black-box auditing"
  • canary credentials: Instrumented credentials used to detect exposure or misuse by observing follow-on activity. "AWS canary credentials"
  • certificate forgery: The act of creating or using a fraudulent certificate to impersonate a service; noted as unnecessary in this architecture. "certificate forgery"
  • command-injectable: A session or workflow that exposes a shell-execution path where commands can be altered before execution. "command-injectable"
  • conditional delivery: Selective activation of payload injection based on triggers such as tool names, keywords, user mode, or warm-up counts. "conditional delivery (AC-1.b)"
  • credential plane: A unified layer for managing and using credentials across multiple providers. "credential plane"
  • dependency confusion: A supply-chain attack where a resolver selects an attacker-controlled package due to namespace or version tricks. "dependency confusion"
  • dependency rewriting: Malicious modification of install commands or dependency specifications to redirect to attacker-controlled packages. "dependency rewriting"
  • dependency-targeted injection: Injection focused on install commands, substituting legitimate packages with attacker-controlled names. "dependency-targeted injection (AC-1.a)"
  • end-to-end integrity: A cryptographic binding that ensures the client executes exactly what the provider produced; absent in current tooling. "end-to-end integrity mechanism"
  • ETH drain: Unauthorized transfer of Ether from a private key after exposure. "drains ETH"
  • fail-closed policy gate: A defensive control that blocks actions unless they pass policy checks, defaulting to deny on uncertainty. "fail-closed policy gate"
  • function calling: An LLM capability to return structured tool invocations with JSON arguments. "function calling"
  • JA3 fingerprints: TLS client fingerprinting technique based on TLS handshake parameters, used to identify clients. "6 JA3 fingerprints"
  • man-in-the-middle (MITM): An adversary intercepting communications; here, the router acts as a deliberate MITM at the application layer. "man-in-the-middle"
  • model substitution: Replacing an intended model with another to alter behavior; excluded from this threat model. "model substitution"
  • multi-hop router chain: A sequence of routers between client and provider, each terminating TLS and handling plaintext. "Multi-hop LLM Router Chain"
  • prompt injection: Attacks that manipulate a model via crafted prompt content to induce undesired tool calls or outputs. "prompt injection"
  • provider-backed response integrity: A proposed mechanism where providers cryptographically bind tool-call outputs to what clients receive. "provider-backed response integrity"
  • response-side anomaly screening: A client-side defense that detects suspicious or out-of-pattern responses before execution. "response-side anomaly screening"
  • response-side payload rewriting: The act of modifying tool-call arguments after the provider response but before client execution. "response-side payload rewriting"
  • router trust boundary: The security boundary created by placing a router between client and provider, with full plaintext visibility. "router trust boundary"
  • router-in-the-middle: A deliberately configured intermediary that stands between client and provider with MITM capabilities. "router-in-the-middle"
  • sandboxed agent environment: An isolated execution setup used to safely run and monitor agent tool calls during testing. "sandboxed agent environment"
  • supply-chain compromise: An attack via dependencies or distribution channels that compromises widely deployed software components. "supply-chain compromise"
  • supply-chain trust boundary: The trust boundary spanning routers and providers in the agent ecosystem’s supply chain. "supply-chain trust boundary"
  • taint propagation: The spread of corrupted or malicious data through system components following an initial injection. "taint propagation."
  • TLS downgrade: Forcing a weaker transport security mode to enable interception; unnecessary when routers are explicitly configured. "TLS downgrade"
  • tool-call semantics: The executable meaning carried by tool-call payloads that determine concrete actions on the client. "tool-call semantics."
  • tool use: First-class API capability for models to invoke external tools with structured arguments. "tool use (also called function calling)"
  • trigger predicate: A condition over request/session features that determines when a router activates payload injection. "trigger predicate"
  • typosquatting: Registering visually similar package names to trick installers into fetching attacker-controlled dependencies. "typosquatting"
  • weak relay chains: Router chains that forward traffic through less secure intermediaries, expanding the attack surface. "weak relay chains"
  • weakest-link property: The property that security of a multi-hop chain is compromised by any single malicious or compromised router. "weakest-link property"
  • YOLO mode: Autonomous mode where agents auto-approve and execute tool calls without manual confirmation. "YOLO mode"

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 27 tweets with 2459 likes about this paper.