CleanAgent: Safe & Efficient AI Agents

Updated 20 March 2026

CleanAgent is a framework that ensures safe, efficient AI operations by combining modular, schema-driven orchestration with declarative data standardization.
The system integrates a multi-agent LLM stack and a one-shot, hands-free workflow for automated data cleaning, reducing manual coding and errors.
Formal methods such as capability tracking and role separation are employed to enforce system safety, prevent context pollution, and enhance performance.

CleanAgent denotes a set of methodologies and system architectures aimed at facilitating safer, more robust, and efficient AI agent operation, especially in code-as-action and data standardization settings. The concept encompasses concrete implementations ranging from declarative data standardization with LLM-driven automation in Python to statically enforced safety harnesses in capability-safe languages like Scala 3, as well as meta-architectures such as multi-agent role separation to counteract context pollution. CleanAgent emphasizes modular, schema-driven orchestration, explicit separation of strategic and implementation states, and the minimization of failure-prone or dangerous code patterns.

1. Architectural Origins and Motivation

CleanAgent frameworks emerge in response to two classes of challenges: (a) the practical complexity and fragility of automating data transformations with LLMs and (b) the need for principled safety and reliability guarantees in agents that generate or execute code. For data tasks, legacy approaches demanded manual, error-prone coding or brittle prompt engineering. Simultaneously, in interactive agent environments, risks related to context pollution, information leakage, and untracked side effects became prohibitive, motivating designs grounded in strict type systems and modular agent orchestration (Qi et al., 2024, Fei et al., 21 Jan 2026, Odersky et al., 1 Mar 2026).

2. Declarative Agent-Driven Data Standardization

A primary instantiation of CleanAgent is the CleanAgent data standardization framework, integrating Dataprep.Clean (a Python library offering one-line standardization APIs for diverse column types) with a multi-agent LLM stack. The typical architecture comprises:

Dataprep.Clean library: Exposes $\mathtt{clean\_<type>}(df, \mathrm{col}, \mathrm{target\_format})$ APIs, currently shipping with 142 type-specialized cleaners (date, address, phone, IP, etc.), each effecting a Split–Validate–Transform pipeline per invocation.
LLM-based multi-agent system: Implements strict role decomposition: a Chat Manager (global memory & orchestration), Column-type Annotator (schema inference), Python Programmer (one-shot code generation using Dataprep.Clean), and Code Executor (sandboxed execution).
Web application interface: A front end for data upload and progress visualization and a back end built on Flask or FastAPI, calling into LLM-driven CleanAgent pipelines.

This architecture supports a one-shot, hands-free workflow: upon specifying requirements (e.g., “standardize ‘admission’ to MM/DD/YYYY hh:mm:ss, ‘addr’ to {street},LA,{zipcode}”), the system auto-annotates, generates code, and applies cleaning functions until successful output is produced (Qi et al., 2024).

3. Formal Methods and Safety for Code-as-Action Agents

For agent safety in environments that involve critical resource manipulation or classified data, CleanAgent denotes a “safety harness” construction realized in a capability-safe language, concretely Scala 3 with capture checking. The system statically regulates effectful actions and information flow by encoding resource access as first-class “capabilities”: tracked program variables subject to type-and-effect discipline.

Key formal apparatus includes:

Types and Capture Sets: Tracked function types $A \to^{C} B$ that capture only capabilities in $C$ , with $A \to B$ as the pure ( $C = \emptyset$ ) case.
Subtyping via Set Inclusion: Capability sets are partially ordered; $A \to^{C_1} B <: A \to^{C_2} B$ if $C_1 \subseteq C_2$ .
Typing Judgments: Formally, $\Gamma \vdash e : \tau^{C}$ denotes $e$ ’s type $\tau$ and its set $C$ of effectful resource dependencies.
Local Purity Enforcement: All subcomputations that process classified data must type-check as capability-free $(C = \emptyset)$ , e.g., as required by $map$ on $Classified[T]$ .
Safety Theorems: Preservation and progress hold; critically, “no forging” ensures capabilities cannot be unsafely constructed, and noninterference ensures pure functions never leak classified data (Odersky et al., 1 Mar 2026).

Accepted and statically rejected code examples exemplify the prevention of disallowed side effects such as unintended output or information flow, with all capability flows statically traced.

4. Multi-Agent Role Separation and State Management

CleanAgent is further generalized by the CodeDelegator framework, which addresses “context pollution” in long-horizon code-as-action agents by dividing responsibilities between two agent types:

Delegator (strategic planner): Decomposes tasks, writes formal specifications, and monitors progress; never executes code.
Coder (ephemeral implementer): Receives a clean, minimal subtask specification, generates and executes code in an isolated environment, and returns only high-level results.

This duality is captured by Ephemeral-Persistent State Separation (EPSS), where the persistent orchestration state tracks only validated artifacts and specifications, and all execution traces or runtime errors remain confined to ephemeral Coder state. As a result, planning context remains uncontaminated by low-level implementation failures. The overall effect is quantifiably improved long-horizon success, as demonstrated empirically (Fei et al., 21 Jan 2026).

5. Evaluation and Empirical Performance

Data Standardization Setting

Internal measurements and user studies of CleanAgent in data cleaning demonstrate:

Method	Avg. LOC/col	Implementation Time (min)
Manual pandas+regex	80	25
ChatGPT‐prompted code	40	15
CleanAgent	2	5

User studies (n=10) report >75% time savings (8/10), high adequacy for one-shot requirement entry (9/10), and <5% code error rate (mostly trivial) (Qi et al., 2024).

Agent Safety and Context Separation

In adversarial code generation settings:

CleanAgent capability-safe agents statically prevented 100% of information leak attempts, while unclassified string-based approaches allowed model-dependent leakage (Sonnet: 98.5%, MiniMax: 91.6% protection).
Utility rates for task performance remained comparable or improved, e.g., τ²-bench airline (CleanAgent: 45.2% vs. 43.8%), retail (57.0% vs. 53.3%) (Odersky et al., 1 Mar 2026).

In role-separated, multi-agent task decomposition (CodeDelegator on τ²-bench and MCPMark):

Method / Domain	pass¹ (%)	pass²	pass³	pass⁴
ReAct (Retail)	79.6	69.9	63.4	58.8
CodeAct (Retail, single)	70.2	59.0	50.0	47.0
CodeDelegator (Retail)	82.0	71.2	63.4	57.0

Similar improvements are observed on diverse agent-environment benchmarks (Fei et al., 21 Jan 2026).

6. Implementation Guidance and Limitations

Declarative, type-specific APIs drastically minimize the LLM learning surface and reduce failure rates by ≈50% (Qi et al., 2024).
Role-based decomposition, strict separation of concerns, and schema-driven state handoff are critical for reliability and scalability.
Out-of-the-box CleanAgent approaches do not address heavily nested/multi-modal fields or extremely wide data tables; batching and custom handler implementation are required in such cases.
Sequential subtasking is enforced in current multi-agent designs; extensions for parallel/DAG task plans are plausible future directions (Fei et al., 21 Jan 2026).

7. Broader Impact and Generalization

CleanAgent principles—explicit schema-driven orchestration, capability tracking for all effectful operations, and the elimination of context pollution via state separation—extend naturally to domains beyond tabular data and code: complex data science ETL pipelines, robotic control systems requiring safe resource access, and scenarios where strategic reasoning must remain orthogonal to detailed execution logs. Any system adhering to strict role separation, structured state management, and tracked “capabilities” can leverage CleanAgent methodologies for improved safety, performance, and scalability (Fei et al., 21 Jan 2026, Odersky et al., 1 Mar 2026).

Markdown Report Issue Upgrade to Chat

References (3)

CleanAgent: Automating Data Standardization with LLM-based Agents (2024)

CodeDelegator: Mitigating Context Pollution via Role Separation in Code-as-Action Agents (2026)

Tracking Capabilities for Safer Agents (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CleanAgent.