- The paper introduces NeuroTaint as a provenance-oriented audit layer for LLM agents, separating explicit content propagation, implicit control influence, and asynchronous provenance reuse.
- It employs a Dynamic Context Provenance Graph (DCPG) to trace agent tool calls, memory writes, and cross-session lineage restoration with precise semantic and causal detection.
- Evaluation on TaintBench demonstrates strong performance (F1 = 0.928) over traditional approaches, establishing a reproducible baseline for secure agent frameworks.
Motivation and Problem Setting
Autonomous LLM agents routinely operate across heterogeneous contexts, orchestrating tool calls, API interactions, and persistent memory access. Incorporation of untrusted external data into these agent contexts exposes them to attack vectors such as indirect prompt injection and unauthorized tool execution, fundamentally challenging conventional security mechanisms. Traditional taint tracking, designed for deterministic program-state analysis, is not applicable to LLM agents whose data propagation is mediated by probabilistic language reasoning and semantic transformations. This paper formulates agent security as a provenance problem, separating source-to-sink propagation auditing from generic unsafe-action detection. NeuroTaint is introduced as a comprehensive provenance auditor tailored to the complexities of LLM agent information flow.
Flow Classes and Motivating Scenarios
NeuroTaint is motivated by the observation that taint propagation in agents materializes along three distinct classes:
FIDES and similar IFC baselines fail to reliably capture semantic and asynchronous flows, as path-level evidence is insufficient to establish whether malicious provenance persists through rewriting, control influence, or delayed reuse.
NeuroTaint System Architecture
NeuroTaint operates as an offline audit layer, ingesting comprehensive execution traces from LLM agent frameworks. The system’s architecture centers on the construction and incremental maintenance of a Dynamic Context Provenance Graph (DCPG) as the provenance backbone. During agent execution, NeuroTaint records source, memory, retrieval, and sink events into the DCPG. At sink-time, two specialized analyzers are applied:
DCPG and Cross-Session Provenance
DCPG is a directed graph encoding agent tool calls, arguments, taint sets, and session ids. NeuroTaint persists taint labels alongside memory write events and reloads full provenance state on session restart, ensuring cross-session continuity. Retrieval events solely serve lineage restoration; final propagation attribution is determined at sink events through semantic or causal auditing.
Figure 3: DCPG cross-session restoration, persisting taint state at memory writes and rehydrating lineage at retrieval in later sessions, enabling delayed provenance audits.
Detection Algorithms
Explicit Content Propagation
Four detection tiers operate in the semantic tracker:
- Canary Matching: Injects unique tokens at sources; verbatim recovery at sinks yields high-confidence propagation.
- LCS Matching: Captures partial lexical reuse via normalized longest common subsequence ratios.
- Semantic Embedding Similarity: Uses sentence transformers to associate paraphrased or translated content under meaning-preserving rewrites.
- Multi-fragment Coverage: Chunks large documents to mitigate signal dilution, searching for malicious provenance localized within fragments.
Implicit Control Influence
The causal analyzer neutralizes tainted sources and probes sink decision invariance. In presence of behavioral change (sink invocation or argument modification), counterfactual evidence of control influence is established and propagation is attributed even in absence of explicit content traces.
Evaluation: TaintBench and Benchmarks
NeuroTaint is evaluated on TaintBench: a benchmark specifically designed for propagation detection across 400 scenarios spanning 20 real-world agent frameworks. On TaintBench, NeuroTaint achieves Precision = 0.921, Recall = 0.935, and F1 = 0.928 for propagation detection, compared to FIDES at F1 = 0.522. Detection errors are primarily confined to semantic attenuation and causal ambiguity boundaries, with false alarms arising from prior-knowledge and topical-overlap controls. Auditing cost is modest, with offline evaluation adding on average 0.25 s per execution unit. NeuroTaint also maintains efficacy on established unsafe-action agent benchmarks such as InjecAgent and ToolEmu.
Practical and Theoretical Implications
NeuroTaint demonstrates that provenance-oriented auditing—explicitly separating source-to-sink flows from unsafe-action generalizations—is necessary for agent security in settings where semantic rewriting, latent control, and asynchronous context reuse dominate. The DCPG abstraction enables robust lineage restoration across process boundaries, while hybrid semantic and counterfactual detectors supplement explicit evidence with control-flow attribution. These contributions redefine the boundary of information-flow analysis in LLM agents and establish a reproducible baseline for benchmarking propagation detection.
Practically, NeuroTaint provides a deployable auditing layer for agent frameworks, supporting policy-driven source/sink configurations and scalable provenance tracking. Theoretical implications include the formalization of propagation classes in agentic systems, emphasizing the necessity of persistent provenance graphs and counterfactual reasoning to capture attacks invisible to string or path-based taint protocols. As LLM agents increasingly embody autonomous workflows, provenance analysis must incorporate semantic, causal, and memory-oriented dimensions to ensure comprehensive security.
Future Directions
Provenance auditing will need to evolve toward real-time and multi-modal settings, expanding beyond post-hoc offline audits. Memory injection attacks, model-level instruction optimization, and advanced isolation schemes pose new challenges for provenance signals. Further integration of strong LLMs for second-stage review could enhance cascade filtering, as shown by improved unsafe-action precision and recall in cascade studies. Scaling provenance analysis to adversarial, compositional, and domain-specific agent architectures remains an open problem.
Conclusion
NeuroTaint introduces a principled, provenance-oriented audit layer for LLM agents, operationalizing explicit content propagation, implicit control influence, and asynchronous provenance reuse via DCPG and sink-time analyzers. Empirical evaluation on TaintBench indicates substantial gains in propagation detection, establishing new upper bounds for agent security auditing. NeuroTaint and TaintBench provide foundational infrastructure for reproducible, systematic provenance analysis in modern agentic AI systems (2604.23374).