- The paper demonstrates that adversarial behavior can be triggered when two ostensibly benign inputsโ a trigger key and an adversarial templateโconverge in multi-agent LLM systems.
- It formalizes the attack using a differentiable surrogate with Gumbel-Softmax relaxation to optimize trigger location, template placement, and routing bias for deterministic activation.
- Experimental results across varying model sizes and topologies highlight the attack's high stealth and the inability of current localized defenses to detect such compositional vulnerabilities.
Conjunctive Prompt Attacks in Multi-Agent LLM Systems
Introduction and Problem Setting
As LLM deployments transition from isolated monolithic models to modular, multi-agent architectures, the attack surface is fundamentally transformed. In typical agentic pipelines, a client agent orchestrates user queries, segments them, and dispatches subtasks to external specialized agentsโeach communicating through prompt interfaces and often invoking privileged tools or databases (Figure 1). This composition enhances modularity and performance but introduces systemic vulnerabilities that do not manifest in single-agent paradigms.





Figure 1: Canonical multi-agent LLM pipelineโclient decomposes user queries and routes subtasks to black-box specialized (remote) agents, each exposing only an NL interface.
The paper "Conjunctive Prompt Attacks in Multi-Agent LLM Systems" (2604.16543) identifies a structural vulnerability unique to multi-agent LLM systems: conjunctive prompt attacks. Here, activation of adversarial behavior requires the co-occurrence of two independently benign inputsโa "trigger key" present in the user query and a hidden adversarial template within a compromised remote agent. Only the conjunction of these at the same agent, achieved via the orchestrated routing, results in harmful behavior. This supply-chain-style threat is topology-dependent and evades current defenses focused on localized prompt inspection.
The paper formalizes agentic multi-agent LLM systems, consisting of a client agent that segments input and dispatches segments stochastically to remote agents via a routing mechanism parameterized by content affinity and routing bias. The adversary manipulates only the prompt content:
- Inserts a benign-looking trigger key in the user query.
- Injects a benign-appearing template into exactly one remote agent (the point of compromise).
- Has no control over agent model weights, client, or routing logic.
Attack activation is conjunctive: it triggers only if (a) a trigger-bearing segment is routed to the compromised agent, and (b) the segment is processed under the injected template. This ensures both the key and template appear innocuous in isolationโfalse activation (on key-only or template-only) is rare.
Activation is operationalized by a strict predicate, allowing for deterministic measurement of attack success and false activation, and enabling robust mode separation (clean, key-only, template-only, both).
Attack Optimization Pipeline
Given the black-box nature of model and routing, discrete variables (segment selection, template placement, routing bias) are optimized using a differentiable surrogate through Gumbel-Softmax relaxation. The objective maximizes the expected rate that trigger-bearing segments hit the compromised agent and are activated by the template, with regularization to suppress false positives and degenerate solutions.
At inference, a single trigger-bearing query is routed by the client; attack manifests only on privileged conjunction, yielding a stealthy activation profile (Figure 2).









Figure 2: Attack pipelineโattacker optimizes over trigger location, template placement, routing bias; attack is realized only when key-bearing segment is routed to the compromised agent with the injected template.
Experimental Results: Robustness, Topology, Model Transfer
The attack pipeline is evaluated on several instruction-tuned LLM backbones (Gemma-2B, Mistral-7B, LLaMA3-8B) and across canonical multi-agent topologies (star, chain, DAG). Four activation regimes (clean, key-only, template-only, both) are reported.
Key numerical findings:
- Baseline attack success rate (ASR) is low before optimization (ASRbothโ < 0.4 and ASRkey,templateโ โ 0).
- Full routing/key/template optimization yields high conjunctive activation (ASRbothโ up to 1.0), with minimal impact on non-conjunctive regimes (ASRkey,templateโ remains โ 0), indicating selective and stealthy adversarial behavior.
- Attacks transfer to larger instruction-tuned and closed-source backbones (e.g., Llama-4-Scout-17B, GPT-5-mini), and persist under varying routing bias.
- Topology heavily influences pre-optimization vulnerability (sporadic success in DAGs due to compounding routing uncertainty), but is mitigated after optimization.
- Current prompt-guard and output-guard models (PromptGuard, Llama-Guard variants) fail to detect fully optimized attacks, with F1-scores dropping to near zero post-optimization even for strongest detectors (see Figure 3 for comparative performance).
(Figure 3)
Figure 3: Detection efficacy of major prompt/guard models; all show severe performance degradation against fully optimized conjunctive attacks.
System-level countermeasures such as tool allowlists or privilege minimization reduceโbut do not eliminateโattack success, underscoring the architectureโs susceptibility to distributed, routing-mediated threats.
Theoretical and Practical Implications
Theoretical Impact
This work rigorously demonstrates that the composition of prompt and routing interfaces is itself a first-class locus of vulnerability in LLM deployments, distinct from model-centric or single-agent threats. The attack class is fundamentally emergent: benign local behavior combines adversarially at the system level. Topology-aware, conjunctive evaluation is essentialโsingle-agent or stateless-agent red-teaming overlooks such vulnerabilities. Furthermore, the scalability and transferability of attacks across model size and topology highlight that system-level dynamics, not just backbone idiosyncrasies, determine real-world LLM safety.
Practical Impact
For deployed AI agents, current prompt- or output-level guards are fundamentally mismatched to the threat surfaceโthey act on local context, not on distributed, conjunctive conditions. Thus, defenses must reason about cross-agent composition, routing decisions, and provenance. The results underscore the urgent need for safety protocols that track trigger/template propagation and model system-wide, not just per-agent, policy compliance. Tools like cross-agent provenance graphs, routing-monitored defenders, and communication-trace audits become central to robust cyberinfrastructure for LLM-based software.
Future Research Directions
The paper suggests several critical research trajectories:
- Design of global, topology-aware guard models capable of jointly modeling agent communication, prompt provenance, and routing logic.
- Development of robust agent orchestration mechanisms with adversarially hardened routing and prompt segmentation (cf. topology-guided security frameworks such as G-Safeguard [wang-etal-2025-g]).
- Generalization to stronger adversary models: multiple compromised agents, adaptive templates, or interactive, multi-turn long-horizon attacks.
- Comprehensive behavioral harm metrics: moving beyond deterministic predicates toward nuanced, real impact measures in real-world agent execution.
Conclusion
Conjunctive prompt attacks represent a systemic vulnerability specific to modern multi-agent LLM architectures: they bypass single-message scrutiny through distributed, compositionally triggered activation. The attacksโ effectiveness across models, communication topologies, and defender paradigms mandates a shift in safety evaluation and countermeasure design: it is no longer sufficient to evaluate or defend individual prompts or outputs in isolation. Instead, true robustness demands global, communication-structure-conditioned adversarial analysis and distributed provenance-traceable safety enforcement. As LLMs become the foundation for increasingly agentic, tool-integrated platforms, the insights and methodology provided by this work are indispensable for secure, reliable deployment.