Papers
Topics
Authors
Recent
2000 character limit reached

Agentic Task Synthesis Pipeline

Updated 3 December 2025
  • Agentic task synthesis pipeline is a framework that converts high-level task descriptions into executable workflows using iterative reasoning and coding agents.
  • It integrates a ReAct control loop, persistent IPython kernel, and a modular tool suite to support dynamic error correction and stateful code execution.
  • Empirical evaluations show 100% success on CP-Bench, demonstrating superior fault tolerance and efficiency compared to static, predetermined pipelines.

The agentic task synthesis pipeline is a paradigm for automating the end-to-end translation of high-level task descriptions into executable, tool-guided workflows via self-directed reasoning, code execution, and verification. In contrast to static, predetermined pipelines, agentic task synthesis leverages a general coding agent—often steered by a ReAct (Reason and Act) control loop and powerful prompt engineering—to achieve incremental, stateful development, dynamic error correction, and modular verification. The architecture exemplified by CP-Agent demonstrates that success in complex domains (e.g., constraint programming) depends not on hand-crafted agent logic or rigid workflows, but on the confluence of general-purpose code execution interfaces, contextual memory, and prompt-encoded domain expertise (Szeider, 10 Aug 2025).

1. Architectural Foundations

Agentic task synthesis is implemented in CP-Agent using a minimal yet robust architecture. Core components include:

  • ReAct Loop Controller: Orchestrates interleaved “Think” and “Do” stages, cyclically engaging the agent in reasoning and tool invocation until convergence.
  • Tool Suite: Exposes a concise set of primitives—read_file, write_file, list_files, delete_file, python_exec, todo_write—to support file I/O, code execution, and task management. These operations are strictly confined to the working directory and return structured signals to the loop controller.
  • Persistent IPython Kernel: All code execution is stateful. The agent interacts with a long-lived IPython process via ZeroMQ, allowing variable retention and cumulative definitions across multiple tool calls.
  • Prompt Hierarchy: Task synthesis is driven by three layers:
    • System Prompt (~200 lines): General agent conventions, tool usage, error handling.
    • Project Prompt (~700 lines): Domain-specific (CP) modeling templates, mandatory workflows, archetype catalog, verification checklists.
    • Task Prompt: The raw natural-language problem description.

The singleton kernel may be wrapped in an isolated environment for package-specific needs (e.g., CPMpy), ensuring state persistence until requirements change or the session ends.

2. Agentic Reasoning and ReAct Loop

The ReAct loop instantiates the agentic workflow:

  • Initialization: The agent loads the problem statement (read_file) into memory, primes the LLM context with system and project prompts, and begins iterative synthesis.
  • Iteration: The loop cycles through “Think–Do–Observe”:

    1. Think (Reason): The LLM examines current context, tool call memory, partial code, and task progress. Reasoning traces yield decomposition plans, decision variable selection, and hypotheses about modeling strategies (e.g., use of cp.AllDifferent).
    2. Do (Act): LLM parses its own output to decide on a tool call (python_exec for code, todo_write for task lists, write_file for final output).
    3. Observe (Feedback): Results from tool calls (e.g., code output, exception stack traces) are injected back into the context, guiding subsequent reasoning steps.
    4. Debugging Feedback: Exceptions (e.g., import errors, model failures) trigger explicit repair actions, such as command corrections or import amendments in new code executions.
  • Termination: The agent emits a COMPLETE signal and writes the final, verified CPMpy script, halting the loop.

3. Project Prompt Specification and Verification

The project prompt encodes domain expertise and prescribes invariant rules for constraint modeling:

  • Mandatory Workflow:
  1. Deconstruct: Parse all input data, extract parameters, infer constants.
  2. Model: Define decision variables using global constraints; incrementally construct the constraint logic from domain to structure to objectives.
  3. Solve & Verify: Execute the CPMpy model; independently validate outputs in pure Python, ensure JSON format conformity; verify all constraints and re-calculate objectives for optimization.
  4. Finalize: Output cleanup and JSON formatting per requirements.
  • Verification: Completion requires passing a 12-item compliance checklist, including domain-specific modeling patterns and avoidance of illicit Python constructs (e.g., if expressions outside CP context).
  • Playbook Catalog: Maps keywords to canonical archetype patterns (e.g., TSP template for “visit every location”, assignment model for “assign workers”).
  • Debug & Performance Tips: Recommends integer scaling for solver compatibility, conditional constructs via cp.Implies, and symmetry-breaking patterns.
  • Appendices: Common modeling pitfalls and API reference.

4. Interfaces and Execution Environment

The agent's environment is defined by strictly constrained IPC and execution primitives:

  • File Operations:
    • read_file(path: str), write_file(path: str, content: str), list_files(pattern: str="*"), delete_file(path: str)—all paths infer working-directory context.
  • Code Execution:
    • python_exec(code: str) transmits code to a stateful IPython kernel, persisting context across calls.
  • Task Management:
    • todo_write(todos: List[Dict[id,content,status,priority]]) enforces a single in_progress task at a time, with contextual recall.
  • Orchestrator: LangGraph manages loop, memory, and tool dispatch; LLM is served via OpenRouter to Claude 4 Sonnet, supporting streaming with logging.

5. Core Workflow and Modeling Snippets

Core pipeline pseudocode enforces incremental reasoning, tool invocation, and observation injection:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
kernel = None  # persistent IPython kernel
tools = { "read": read_file, "write": write_file, "exec": python_exec, "todo": todo_write }
context = load_system_prompt() + load_project_prompt()
problem_text = tools["read"]("task.md")
context += problem_text

done = False
while not done:
    lm_output = call_llm(context)
    if lm_output.calls_tool:
        tool_name, args = parse_tool_call(lm_output)
        result = tools[tool_name](**args)
        obs = format_observation(tool_name, result)
        context += lm_output.content + obs
    elif lm_output.signals_completion:
        tools["write"](lm_output.filename, lm_output.code)
        done = True
    else:
        context += lm_output.content

Modeling snippets are directly mapped to CPMpy code—for example, an AllDifferent constraint is encoded:

model+=AllDifferent(x1,x2,,xn)\text{model} += \mathrm{AllDifferent}(x_1, x_2, \dots, x_n)

1
2
3
from cpmpy import *
x = intvar(1, n, shape=n, name="x")
model = Model(AllDifferent(x))

Summation constraints (e.g., for knapsack problems) employ:

1
2
total_weight = cp.sum([x[i] * weight[i] for i in range(n_items)])
model += total_weight <= capacity

6. Empirical Evaluation: CP-Bench and Comparative Analysis

  • Benchmark Setup: Applied to CP-Bench, comprising 101 problems (30 optimization, 71 satisfaction), sourced from CSPLib, CPMpy, and course exercises.
    • All task descriptions (task.md) are paired with explicit JSON schemas.
    • Each run records the final CPMpy script plus conversation logs (agentic-python-coder v1.0.0, Claude 4 Sonnet 20241022).
  • Validation:
    • Satisfaction: Models are checked against reference CPMpy solutions for constraint satisfaction.
    • Optimization: Objective value is independently checked for optimality.
    • Checklist enforcement ensures compliance.
  • Results:
    • 100% success rate: All 101 problems solved optimally; no failures observed.
    • Tool usage statistics:
    • python_exec calls/problem: 4–23 (higher for scheduling).
    • todo_write: used in 59 problems, with task lists (1–10 items).
    • read_file, write_file: exactly one in 99/101 cases (two exceptions with intermediates).
    • Token usage per problem: ~180K input, ~6K output.
    • Fault tolerance: All exception types were absorbed and repaired within the loop.
  • Comparison:
    • Fixed-pipeline methods plateau at ~70% accuracy (CPMpy subset).
    • CP-Agent’s pure agentic approach achieves complete coverage and superior flexibility.

7. Significance and Generalization

The agentic task synthesis pipeline, as exemplified by CP-Agent, demonstrates the practical and theoretical advantages of general coding agents for structured modeling:

  • Elimination of rigid architectures: Success is driven by prompt-encoded expertise and dynamic memory, not by embedding domain logic into architecture.
  • Efficiency: Achieves coverage with a minimal implementation (~hundreds of lines of code).
  • Debuggability and robustness: Exception handling and self-repair are integral, supporting dynamic convergence.
  • Applicability: The paradigm generalizes to other domains where iterative code synthesis, domain-specific prompt design, and stateful execution are necessary for optimal performance.

This approach situates agentic synthesis pipelines as the canonical strategy for scalable, verifiable, and high-fidelity translation of natural-language tasks into executable programs in complex domains (Szeider, 10 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Agentic Task Synthesis Pipeline.