Papers
Topics
Authors
Recent
Search
2000 character limit reached

EvoX: Tackling Context Pollution in Digital Systems

Updated 2 July 2026
  • EvoX is a framework that defines context pollution as the contamination of digital environments through unauthorized merging of information in AI agents, web tracking, and code execution.
  • It employs methodologies such as entropy metrics, context isolation, and execution state partitioning to quantify and mitigate adverse effects on system performance.
  • Empirical analyses within EvoX reveal improved agent success rates and enhanced privacy protection, while also highlighting challenges in cross-context identity management and misinformation control.

Context pollution refers to the contamination or unwarranted merging of distinct information streams or contexts, resulting in adverse effects on cognitive processing, security, privacy, or system performance. Originating as a metaphor from environmental science, "pollution" is applied to digital agents, information environments, and computational workflows to describe the silent, often inadvertent, blending or overloading of working memory, behavioral context, or user identity boundaries by irrelevant, untrusted, or adversarial information. The phenomenon is critical in domains ranging from personal AI agents and LLM-based workflows to web tracking, misinformation ecosystems, and multi-agent coordination.

1. Formal Definition and Taxonomy

Context pollution admits several formalizations depending on the substrate:

  • AI Agent Memory Pollution: Defined as the unauthorized or silent absorption of untrusted external content (e.g., emails, social feeds) into an agent’s working memory context (short- or long-term). These infiltrations subsequently influence user-facing behavior and task execution, often without explicit user oversight or adequate provenance (Zhang et al., 24 Mar 2026).
  • Context Collapse in Web Privacy: Framed in terms of "contextual integrity," context pollution emerges when cross-context identifier reuse (e.g., third-party cookies, JS fingerprints) collapses previously disjoint social/informational domains—such as health and finance—violating privacy norms (Sivan-Sevilla et al., 2024).
  • LLM and Agentic Noise Accumulation: Characterized as the growth of high-entropy or irrelevant tokens within the system memory, leading to degraded reasoning due to the "lost-in-the-middle" phenomenon (Li et al., 13 Apr 2026).
  • Code-as-Action Artifact Dilution: In code-execution agents, context pollution arises from shared accumulation of irrelevant execution traces and errors, which dilute planning-critical information in token-limited working memory (Fei et al., 21 Jan 2026).
  • Information Pollution in Misinformation: Generalized as the dissemination of misleading information in social systems; context pollution models the negative externalities such content imposes within digital information environments (Kazemi et al., 2023).
  • Multimodal Evidence Pollution: In misinformation detection, context pollution refers to the introduction of GenAI-generated ("polluted") text or images into evidence streams, impairing detector robustness and accuracy (Yan et al., 24 Jan 2025).

Editor’s term: context collapse refers specifically to privacy violations due to context-mixing, while context pollution encompasses both adversarial and unintentional noise accumulation in agent or system memory.

2. Mechanisms and Theoretical Frameworks

2.1 E→M→B Pathway (Claw Agents)

Formally, (Zhang et al., 24 Mar 2026) models memory pollution as a three-stage process:

  • Exposure (E): Content encountered during background heartbeat execution is ingested.
  • Memory (M): The ingested content enters the session context; if persisted, it contaminates long-term storage.
  • Behavior (B): Polluted memory later shapes downstream decisions and observable agent behavior.

If cic_i is an adversarial claim at heartbeat tt:

  • Short-term: ciContexttc_i \in \text{Context}_t \Rightarrow influences reasoning at t+kt + k
  • Long-term: If cic_i is flushed to MEMORY.md, \rightarrow cross-session retrieval and continued influence.

2.2 Privacy as Contextual Integrity (Web Tracking)

[Nissenbaum’s] contextual integrity theory (Sivan-Sevilla et al., 2024) asserts distinct social/informational contexts, each with specific transmission norms. Context pollution (collapse) arises when persistent identifiers facilitate cross-context tracking:

  • Within-context collapse: Fraction of site-pairs sharing an identifier within a single context.
  • Between-context collapse: Fraction of cross-context site-pairs sharing an identifier, measured via network graphs and diffusion distance metrics.

2.3 Entropy Metrics and Noise-Pruning (LLM Agents)

(Li et al., 13 Apr 2026) quantifies context pollution by the information entropy H(ht)H(h_t) of the agent's working memory hth_t. As cumulative irrelevant noise ziz_i grows relative to critical reasoning anchors tt0, tt1 increases, yielding cognitive bottlenecks and failures.

2.4 Execution State Partitioning (Code-as-Action)

In agentic code orchestration, (Fei et al., 21 Jan 2026) partitions token buffers at turn tt2 into relevant tt3 and irrelevant tt4 tokens. Pollution ratio tt5 negatively correlates with success rate, especially under complex, long-horizon decompositions.

3. Empirical Characterization and Measurement

3.1 Agent Memory Pollution

Experiments using the MissClaw testbed (Zhang et al., 24 Mar 2026) reveal:

  • Attack Success Rates (ASR) up to 61% for manipulated social posts; consensus cues outweigh formal authority.
  • Cross-session ASR reaching 76% post-memory flush.
  • Context pruning and feed dilution reduce but do not eliminate pollution propagation (cross-session ASR remains at 11–18% under dilution/pruning).

3.2 Tracking-based Context Collapse

Large-scale crawls across seven sensitive web contexts (Sivan-Sevilla et al., 2024) show:

  • Over 56% of IDs from adult sites survive into all six other contexts; for News & Media, 20.3% of identifiers persist across maximum inter-context distance.
  • Cluster-level chromatic numbers dictate the minimum required browser “containers” to block intra-context or multi-context pollution.

3.3 Noise Accumulation in LLM Agents

(Li et al., 13 Apr 2026) documents that aggressive entropy-pruning (via a compact, RL-trained ContextCurator) improves task success rate by 4.8 points on WebArena (with an 8.8% reduction in token use), and by 3–8 points on DeepSearch (with an ≈80% reduction in token consumption).

3.4 Code Trace Dilution in Multi-agent Planning

In (Fei et al., 21 Jan 2026), increasing context length (due to debugging traces and error artifacts) inversely correlates with pass@k success rates across planning and task execution benchmarks—CodeDelegator’s deliberate role separation mitigates this effect, outperforming monolithic designs by 10–12 percentage points on several benchmarks.

3.5 GenAI-powered Evidence Pollution

(Yan et al., 24 Jan 2025) shows that injection of LLM-generated evidence reduces OOC detector accuracy by 8–14 points, with textual pollution exerting stronger negative effects than visual. Cross-modal reranking and claim-evidence reasoning recover nearly all lost accuracy.

4. Cross-domain Mitigation Strategies

A range of robust, domain-specific defenses against context pollution emerges:

  • Session Isolation and Sandbox Design: In personal agents (Claw, CodeDelegator), strict separation of background ingestion, execution traces, and conversational state prevents unauthorized or irrelevant merge of content (Zhang et al., 24 Mar 2026, Fei et al., 21 Jan 2026).
  • Provenance and Tracking: All memory (or browser storage) entries are tagged with channel, timestamp, and event type (heartbeat vs. user), with provenance guardrails enforced prior to memory flush (Zhang et al., 24 Mar 2026, Sivan-Sevilla et al., 2024).
  • User Review and Confirmation: User-facing summaries and manual confirmation steps for every context-flush or memory write reduce silent memory pollution (Zhang et al., 24 Mar 2026).
  • Aggressive Entropy Reduction: RL-trained context curator models prune noisy context while preserving reasoning anchors, maximizing executor performance under token constraints (Li et al., 13 Apr 2026).
  • Browser Containerization: Graph coloring of tracking networks to isolate web contexts, using containers robust against persistent identifier reuse (Sivan-Sevilla et al., 2024).
  • Role Specialization: Ephemeral-persistent state separation and dynamic subtask delegation insulate persistent planning state from execution-specific artifacts (Fei et al., 21 Jan 2026).
  • Evidence Filtering and Consistency Checking: Multimodal reranking (e.g., via CLIP cosine similarity) and cross-modal reasoning modules reject or deprioritize polluted evidence in misinformation detection (Yan et al., 24 Jan 2025).
  • Policy-driven Taxation: For misinformation, Pigouvian tax mechanisms are proposed to internalize the social cost of viral context pollution, realigning incentive structures in information ecosystems (Kazemi et al., 2023).

5. Implications, Limitations, and Open Challenges

Context pollution, across agent architectures, web privacy, and misinformation, is not merely an efficiency adversary but a vector for structural vulnerabilities:

  • Silent, Zero-click Attacks: Context pollution enables attackers to shape agent memory and behavior without direct prompt injection, simply by manipulating background content or identifier reuse (Zhang et al., 24 Mar 2026, Sivan-Sevilla et al., 2024).
  • Pruning as Security Boundary: Current context reduction and token management are efficiency tools, not true security barriers—adversarial noise surviving these thresholds remains influential (Zhang et al., 24 Mar 2026).
  • Limitations of Passive Defenses: Reliance on CLIP embedding spaces, or simplistic thresholding in evidence reranking, may be evaded by future GenAI advancements or adversary-controlled retrieval channels (Yan et al., 24 Jan 2025).
  • Scalability and Asynchrony: Existing multi-agent orchestration frameworks often assume sequential plans; extending to DAG-structured or asynchronous task flows without reintroducing pollution is an unresolved challenge (Fei et al., 21 Jan 2026).
  • Cross-context Identity Risks: Even fine-grained partitioning of browser storage is limited by the chromatic number of ever-evolving third-party networks, and may not eliminate identifier leakage under adversarial coordination (Sivan-Sevilla et al., 2024).
  • Measurement and Attribution: Practical implementation of social-cost taxes on context pollution (e.g., misinformation) must overcome challenges in definition, measurement, engagement attribution, and avoidance of regulatory arbitrage (Kazemi et al., 2023).

6. Synthesis and Research Outlook

The literature demonstrates that context pollution is an architectural and ecological threat at the intersection of cognitive systems, computational privacy, and information science. Addressing it requires:

  • Architectural rethinking to enforce strict context boundaries.
  • Provenance-rich memory and storage models to guarantee context integrity.
  • Active curation via lightweight, task-specific modules to reduce entropy and prevent reasoning failures.
  • Adaptive and explainable filtering, leveraging both user-in-the-loop confirmations and autonomy-preserving skepticism (epistemic stance adjustment).
  • Integrated economic and policy approaches to align actor incentives with collective welfare.

Future research directions include formal certificates of robustness against context pollution, adaptive containerization at the web scale, and adversarial training of both agents and detectors in increasingly noisy, multi-source evidence environments.

Key References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EvoX.