CleanAgent Framework
- CleanAgent Framework is a modular, agentic architecture that automates end-to-end cleaning tasks for both data standardization and static-analysis code repair.
- It decomposes complex cleaning workflows into preparation, classification, repair, and validation layers using iterative LLM-driven orchestration.
- Performance evaluations demonstrate high repair accuracy—with up to 96.8% plausible fixes—surpassing conventional methods in scalability and reliability.
CleanAgent Frameworks are modular, agentic architectures leveraging LLM agents to automate end-to-end cleaning tasks such as data standardization or static-analysis warning repair. Unifying declarative APIs, iterative LLM-driven orchestration, and robust validation or execution layers, CleanAgent frameworks abstract complex, error-prone developer tasks into hands-free, user-guided loops. These architectures are exemplified by two primary strands: (1) data standardization frameworks integrating tightly-coupled clean APIs and LLM-based orchestration (Qi et al., 2024), and (2) codebase cleaning frameworks employing agentic pipelines for static code analysis triage and repair (Joos et al., 15 Sep 2025). Both canonical implementations demonstrate modular separation between preparation, classification (or annotation), repair (or transformation), and automated validation, maximizing automation while remaining adaptable across domains.
1. Architectural Overview
CleanAgent frameworks exhibit a multilayered design, typically organized as follows:
- API/Tool Layer: In data standardization, the Dataprep.Clean library provides 142 distinct clean_type functions, each standardizing a single column type via a uniform Python signature (e.g.,
clean_date(df, "AdmissionDate", "MM/DD/YYYY")) (Qi et al., 2024). For static-analysis warnings, agents utilize APIs for reading documentation, exploring symbol references, and writing multi-hunk patches (Joos et al., 15 Sep 2025). - Agentic Orchestration Layer: LLM-powered agents decompose high-level cleaning intents (e.g., user requirements or warning resolution goals) into subproblems. This layer orchestrates specialized sub-agents for type annotation/classification, code/script generation, edit application, or patch approval.
- Validation/Execution Layer: CleanAgent frameworks incorporate automated execution and feedback loops. For data, Python-generated cleaning scripts run in sandboxed environments with fault-tolerance. For code, proposed patches undergo a sequential three-step approval: build, static analysis verification, and regression testing (Joos et al., 15 Sep 2025).
The multi-agent loop is coordinated either by a Chat Manager (data cleaning) (Qi et al., 2024) or by distinct sub-agents for classification and repair (code cleaning) (Joos et al., 15 Sep 2025).
2. Workflow and Iterative Looping
A CleanAgent process typically operates in the following sequence:
- Input Specification: Users upload data (e.g., CSV table) or designate a repository at a commit; optional natural language requirements refine process constraints (Qi et al., 2024, Joos et al., 15 Sep 2025).
- Type Annotation or Warning Collection: LLMs annotate data column types from samples or collect static-analysis warnings with their metadata.
- Plan/Sense-Plan-Act Cycle:
- For each target (data column or warning), the orchestrator dynamically prompts the LLM, providing role objectives, available APIs, metadata, and an interaction trace.
- The LLM agent issues tool calls iteratively, executing on local data/code, updating the prompt with tool outputs, and continuing until goals are achieved (e.g., "GiveFinalVerdict" or "GoalsAccomplished") (Joos et al., 15 Sep 2025).
- Classification and repair cycle limits cap the number of prompt-tool cycles (e.g., 20 for classification, 40 for repair in the CodeCureAgent model (Joos et al., 15 Sep 2025)).
- Script/Patch Generation:
- Data: A Python Programmer agent generates a script using the clean_* APIs (Qi et al., 2024).
- Code: A Repair sub-agent constructs code patches or suppression edits for FP warnings (Joos et al., 15 Sep 2025).
- Validation:
- Data: Scripts execute in a container; failure triggers iterative error-fixing via LLM prompt enrichment.
- Code: Candidate fixes must pass three steps—build, static analysis (no new warnings), and test suite (Joos et al., 15 Sep 2025).
This iterative error-recovery or patch-approval loop is core to CleanAgent robustness.
3. Declarative API Design and Usage
The declarative API principle underlies CleanAgent usability:
- Dataprep.Clean’s API: All column-standardization functions expose a uniform interface:
for column types such as date, address, phone, email, IP, etc. This reduces code complexity from manual Pandas/regex logic to a single line per column (Qi et al., 2024).1
clean_<type>(df, column_name, target_format)
- CodeCureAgent Tool APIs: The agent’s prompt enumerates available tools—such as ReadDocumentation, ReadLines, FindReferences, WriteFix, FormulatePlan—each invoked via a strict JSON-like schema inside the LLM loop (Joos et al., 15 Sep 2025).
These APIs enable LLM agents to compose and execute complex workflows with minimal human intervention.
4. Validation and Fault Tolerance
Automated validation and correction are critical components:
- Data Standardization Validation: Scripts are executed in sandboxed Python environments. Errors are trapped and fed into prompt memory, allowing the LLM agent to attempt corrections without user intervention, achieving "iterative robustness" (Qi et al., 2024).
- Patch Approval (Static Analysis Repair): Three-step validation enforces patch correctness. Only patches that build, suppress the initial warning (with no new warnings), and pass all tests are deemed plausible (Joos et al., 15 Sep 2025).
| Framework | Validation Mechanism | Automation Loop |
|---|---|---|
| Dataprep.Clean | Script execution with error trapping and recursion | End-to-end hands-free |
| CodeCureAgent | Build → Static analysis → Test suite (three-step approval) | Iterative cycles |
This design both maximizes automation and supports fault-tolerance, reducing required user expertise.
5. Performance and Evaluation
CleanAgent frameworks prioritize complexity reduction and hands-free, reliable operation:
- Data Standardization: Users perform standardization with a single API invocation per column type rather than hundreds of lines of manual cleaning logic. "Hands-free operation" is achieved: the user uploads input and requirements, and the agent orchestrates all downstream tasks (Qi et al., 2024).
- Code Cleaning (CodeCureAgent): On a dataset of 1,000 SonarQube warnings (291 distinct rules, 106 projects), the framework generated fixes for 100%, with 96.8% plausible fixes. Manual inspection measured 91.8% correct classification and 86.3% correct fixes on a stratified subset (Joos et al., 15 Sep 2025). End-to-end average runtime was 4.4 minutes per warning, with an effective LLM cost averaging 2.9 cents per warning.
Comparative evaluations:
- CodeCureAgent surpassed Sorald in plausible-fix rate (+30.7 percentage points) and CORE (+29.2 pp) (Joos et al., 15 Sep 2025).
- Inferred: The agentic iterative validation loop enables superior autonomy and reliability in large, real-world codebases or datasets.
6. Web-Based and User Interaction Layer
A web-application interface facilitates integration and user accessibility:
- Frontend: Users upload datasets, view metadata, and trigger agentic cleaning.
- Backend: A microservice architecture mediates LLM agent interactions, script/code execution, and result streaming.
- Observability: Intermediate "thoughts" of the agents are streamed to the UI, allowing transparent monitoring (as implemented in the CleanAgent demo) (Qi et al., 2024).
The integration of web applications demonstrates system practicality beyond standalone scripts or libraries.
7. Context and Applications
CleanAgent frameworks are representative architectures for fully automated, LLM-driven cleaning tasks across domains:
- Software Engineering: Agentic repair and suppression pipelines for static-analysis warnings, with applicability to CI/CD workflows (Joos et al., 15 Sep 2025).
- Data Science: Automated, declarative data standardization pipelines, enabling scalable, code-light data cleaning (Qi et al., 2024).
The modular separation—classification/annotation, plan/sense-plan-act, repair/execution, and validation—provides a blueprint for extending the CleanAgent paradigm to new languages, data modalities, or forms of automated reasoning.
A plausible implication is that as LLM capabilities and tool integration mature, CleanAgent-style architectures could become baseline infrastructure for a wide class of program repair, data wrangling, and automated quality assurance pipelines.