Ghost in the Agent: Redefining Information Flow Tracking for LLM Agents

Published 25 Apr 2026 in cs.CR | (2604.23374v1)

Abstract: Autonomous LLM agents are increasingly deployed to conduct complex tasks by interacting with external tools, APIs, and memory stores. However, processing untrusted external data exposes these agents to severe security threats, such as indirect prompt injection and unauthorized tool execution. Securing these systems requires effective information flow tracking. Yet, traditional taint analysis that is designed for program memory states fundamentally fails when applied to LLMs, where data propagation is governed by probabilistic natural language reasoning. In this paper, we present NeuroTaint, the first comprehensive taint tracking framework tailored for the unique information flow characteristics of LLM agents. Our key insight is that taint propagation in LLM agents must be understood not only as explicit content transfer, but also as semantic transformation, causal influence on decisions, and cross-session persistence through memory. NeuroTaint therefore audits execution traces offline to reconstruct provenance from untrusted sources to privileged sinks using semantic evidence, causal reasoning, and persistent context tracking, rather than relying on exact string matches or pre-defined source-sink paths alone. Extensive evaluation using TaintBench, our 400-scenario benchmark spanning 20 real-world agent frameworks, shows that NeuroTaint substantially outperforms FIDES, an information-flow-control (IFC)-style baseline for LLM agents, in source-sink propagation detection. We further show that NeuroTaint remains effective on established agent-security benchmarks, including InjecAgent and ToolEmu, while operating offline with modest additional auditing cost.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces NeuroTaint as a provenance-oriented audit layer for LLM agents, separating explicit content propagation, implicit control influence, and asynchronous provenance reuse.
It employs a Dynamic Context Provenance Graph (DCPG) to trace agent tool calls, memory writes, and cross-session lineage restoration with precise semantic and causal detection.
Evaluation on TaintBench demonstrates strong performance (F1 = 0.928) over traditional approaches, establishing a reproducible baseline for secure agent frameworks.

NeuroTaint: Provenance-Oriented Information Flow Auditing for LLM Agents

Motivation and Problem Setting

Autonomous LLM agents routinely operate across heterogeneous contexts, orchestrating tool calls, API interactions, and persistent memory access. Incorporation of untrusted external data into these agent contexts exposes them to attack vectors such as indirect prompt injection and unauthorized tool execution, fundamentally challenging conventional security mechanisms. Traditional taint tracking, designed for deterministic program-state analysis, is not applicable to LLM agents whose data propagation is mediated by probabilistic language reasoning and semantic transformations. This paper formulates agent security as a provenance problem, separating source-to-sink propagation auditing from generic unsafe-action detection. NeuroTaint is introduced as a comprehensive provenance auditor tailored to the complexities of LLM agent information flow.

Flow Classes and Motivating Scenarios

NeuroTaint is motivated by the observation that taint propagation in agents materializes along three distinct classes:

Explicit content propagation: Untrusted content is paraphrased or semantically rewritten by the agent but its intent persists in privileged sink actions.
Implicit control influence: Untrusted data acts as a latent control variable driving agent decisions (e.g., tool selection) without direct content transfer.
Asynchronous provenance reuse: Source-derived taint survives across session boundaries and persistent memory, requiring lineage rehydration to trace delayed triggers.
Figure 1: Three TaintBench cases illustrating explicit content propagation, implicit control influence, and asynchronous provenance reuse audited by NeuroTaint.

FIDES and similar IFC baselines fail to reliably capture semantic and asynchronous flows, as path-level evidence is insufficient to establish whether malicious provenance persists through rewriting, control influence, or delayed reuse.

NeuroTaint System Architecture

NeuroTaint operates as an offline audit layer, ingesting comprehensive execution traces from LLM agent frameworks. The system’s architecture centers on the construction and incremental maintenance of a Dynamic Context Provenance Graph (DCPG) as the provenance backbone. During agent execution, NeuroTaint records source, memory, retrieval, and sink events into the DCPG. At sink-time, two specialized analyzers are applied:

Hybrid Semantic Tracker: Determines explicit propagation by aligning tainted source fragments with sink arguments using lexical anchors and embedding-based semantic similarity.
Sink-Driven Causal Analyzer: Detects implicit control flows by neutralizing candidate sources and measuring counterfactual behavioral divergence at sinks.
Figure 2: NeuroTaint workflow with DCPG capturing provenance lineage across tool/memory events and sink-time analyzers for explicit and causal auditing.

DCPG and Cross-Session Provenance

DCPG is a directed graph encoding agent tool calls, arguments, taint sets, and session ids. NeuroTaint persists taint labels alongside memory write events and reloads full provenance state on session restart, ensuring cross-session continuity. Retrieval events solely serve lineage restoration; final propagation attribution is determined at sink events through semantic or causal auditing.

Figure 3: DCPG cross-session restoration, persisting taint state at memory writes and rehydrating lineage at retrieval in later sessions, enabling delayed provenance audits.

Detection Algorithms

Explicit Content Propagation

Four detection tiers operate in the semantic tracker:

Canary Matching: Injects unique tokens at sources; verbatim recovery at sinks yields high-confidence propagation.
LCS Matching: Captures partial lexical reuse via normalized longest common subsequence ratios.
Semantic Embedding Similarity: Uses sentence transformers to associate paraphrased or translated content under meaning-preserving rewrites.
Multi-fragment Coverage: Chunks large documents to mitigate signal dilution, searching for malicious provenance localized within fragments.

Implicit Control Influence

The causal analyzer neutralizes tainted sources and probes sink decision invariance. In presence of behavioral change (sink invocation or argument modification), counterfactual evidence of control influence is established and propagation is attributed even in absence of explicit content traces.

Evaluation: TaintBench and Benchmarks

NeuroTaint is evaluated on TaintBench: a benchmark specifically designed for propagation detection across 400 scenarios spanning 20 real-world agent frameworks. On TaintBench, NeuroTaint achieves Precision = 0.921, Recall = 0.935, and F1 = 0.928 for propagation detection, compared to FIDES at F1 = 0.522. Detection errors are primarily confined to semantic attenuation and causal ambiguity boundaries, with false alarms arising from prior-knowledge and topical-overlap controls. Auditing cost is modest, with offline evaluation adding on average 0.25 s per execution unit. NeuroTaint also maintains efficacy on established unsafe-action agent benchmarks such as InjecAgent and ToolEmu.

Practical and Theoretical Implications

NeuroTaint demonstrates that provenance-oriented auditing—explicitly separating source-to-sink flows from unsafe-action generalizations—is necessary for agent security in settings where semantic rewriting, latent control, and asynchronous context reuse dominate. The DCPG abstraction enables robust lineage restoration across process boundaries, while hybrid semantic and counterfactual detectors supplement explicit evidence with control-flow attribution. These contributions redefine the boundary of information-flow analysis in LLM agents and establish a reproducible baseline for benchmarking propagation detection.

Practically, NeuroTaint provides a deployable auditing layer for agent frameworks, supporting policy-driven source/sink configurations and scalable provenance tracking. Theoretical implications include the formalization of propagation classes in agentic systems, emphasizing the necessity of persistent provenance graphs and counterfactual reasoning to capture attacks invisible to string or path-based taint protocols. As LLM agents increasingly embody autonomous workflows, provenance analysis must incorporate semantic, causal, and memory-oriented dimensions to ensure comprehensive security.

Future Directions

Provenance auditing will need to evolve toward real-time and multi-modal settings, expanding beyond post-hoc offline audits. Memory injection attacks, model-level instruction optimization, and advanced isolation schemes pose new challenges for provenance signals. Further integration of strong LLMs for second-stage review could enhance cascade filtering, as shown by improved unsafe-action precision and recall in cascade studies. Scaling provenance analysis to adversarial, compositional, and domain-specific agent architectures remains an open problem.

Conclusion

NeuroTaint introduces a principled, provenance-oriented audit layer for LLM agents, operationalizing explicit content propagation, implicit control influence, and asynchronous provenance reuse via DCPG and sink-time analyzers. Empirical evaluation on TaintBench indicates substantial gains in propagation detection, establishing new upper bounds for agent security auditing. NeuroTaint and TaintBench provide foundational infrastructure for reproducible, systematic provenance analysis in modern agentic AI systems (2604.23374).

Markdown Report Issue