A²Flow Operator Induction
- The paper introduces a fully automated framework that induces self-adaptive abstraction operators from expert demonstrations, eliminating manual operator design.
- It clusters and refines operators using LLM embeddings and chain-of-thought prompting to synthesize coherent, multi-step workflows.
- Empirical results show improved efficiency and performance across diverse benchmarks, with significant resource reductions and enhanced task execution.
A²Flow Operator Induction is a fully automated framework for agentic workflow generation based on self-adaptive abstraction operators. This mechanism moves beyond prior methods that rely on manually predefined operators by automatically inducing, abstracting, and integrating reusable operator blocks from expert demonstrations, leveraging LLM reasoning throughout. The central objective is to construct efficient, generalizable workflows for complex tasks through data-driven operator synthesis, abstraction, and search, eliminating the need for hand-crafted, low-level primitives (Zhao et al., 23 Nov 2025).
1. Self-Adaptive Abstraction Operators: Definition and Formalization
In A²Flow, a self-adaptive abstraction operator is a reusable, LLM-powered code “block” encapsulating recurring subroutines (e.g., “Plan”, “Execute”, “Validate”) within multi-step agentic workflows. Each operator is defined as a Python-like class with a single asynchronous __call__ method, parameterized by the LLM itself. Operators act as black-box transforms, each mapping a single input to a single output. Formally, let denote a set of expert task cases and denote the LLM. Given a prompt template , the case-based initial operator extraction function
yields, for each , a set of code operators where
These initial operators, , populate the operator pool used as nodes in subsequent workflow synthesis and search.
2. The Three-Stage Operator Extraction and Abstraction Cascade
A²Flow induces generalizable operators from raw cases via a pipeline of three refinement stages:
2.1 Case-Based Initial Operator Generation
Expert demonstrations are split into 20% validation and 80% test subsets. For each in the validation set, the LLM is prompted with to extract operators for the case. Each extraction produces Pythonic code blocks (e.g.,
1 2 3 4 5 6 |
class DataExtractor(object): def __init__(self, LLM=AsyncLLM()): self.LLM = LLM self.operator_prompt = "Extract structured data from the paragraph." async def __call__(self, input): return await self.LLM(input, self.operator_prompt) |
2.2 Operator Clustering and Preliminary Abstraction
This stage reduces redundancy. Each viable operator is embedded into a vector representation via the LLM. K-means clustering solves
grouping semantically similar operators. For each cluster , a “preliminary abstract operator” is synthesized using an LLM prompt to merge and compress all code in the cluster into a single, well-titled, minimal block. The collection is .
2.3 Deep Extraction for Abstract Execution Operators
Each preliminary operator is further abstracted via multi-path, chain-of-thought (CoT) prompting. For chains, iterative CoT refinement is applied:
- Step 1: ,
- Step 2: ,
- Step 3: ,
where is the task instruction and is a prompt to “make it deeper and more general.” Across chains, operators whose self-consistency frequency is at least (commonly ) are retained, and reflection-driven regeneration ensures correctness ( for all ).
3. Operator Memory Mechanism
A²Flow augments workflow search with an operator memory mechanism, diverging from approaches where individual operators only see the immediate predecessor’s output. For workflow node , the memory set is recursively extended:
with as the complete response from . Execution now follows
This enables each operator at step to utilize the summarized context of all preceding steps, improving workflow coherence (measured at on MATH via ablation).
4. Experiments and Quantitative Performance
A²Flow is benchmarked across eight datasets covering code, math, QA, and embodied agent tasks: HumanEval, MBPP, GSM8K, MATH, HotpotQA, DROP, ALFWorld, and TextCraft. Used metrics include:
- F on HotpotQA, DROP (),
- pass@1 (fraction of correct first attempts) on HumanEval, MBPP,
- SolveRate (correct/total) on GSM8K, MATH,
- Binary success on ALFWorld, TextCraft.
Relative to AFLOW and other baselines, A²Flow yields a average gain on reading, code, and reasoning tasks, on embodied/game tasks, and a reduction in resource usage. For example, on DROP with GPT-4o-mini, cost per run drops from \$1.37 to \$0.51 while F increases by .
5. High-Level Induction and Search Pseudocode
The following LaTeX-formatted algorithm encapsulates A²Flow’s operator induction and memory-augmented workflow search procedure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
\begin{algorithm}[H]
\caption{A²Flow Operator Induction {data} Workflow Search}
\begin{algorithmic}[1]
\Require expert cases 𝒞; LLM M; eval G; search rounds R
\State Split 𝒞→𝒞_val (20%), 𝒞_test (80%)
\Comment{ Extract Self-Adaptive Operators }
\ForAll{Cᵢ ∈ 𝒞_val}
\State Prompt Pₑ → O⁽ᵉ⁾_i ← E(Cᵢ, Pₑ, M)
\State Prune any o with δ(o)=0
\EndFor
\State Embed & cluster \{O⁽ᵉ⁾₁…\} → K clusters → O⁽ᵃ⁾ via Pₐ
\For{p=1…m} \Comment{Deep extraction via CoT}
\State o_{p,1}←M(I,Pₜ,O⁽ᵃ⁾), o_{p,2}←M(I,Pₜ,o_{p,1},CoT), o_{p,3}←M(I,Pₜ,o_{p,1},o_{p,2},CoT)
\EndFor
\State O⁽ᵗ⁾←𝒜ₜ(\{o_{p,3}\}_{p=1}^m,Pₜ,M) \Comment{each o∈O⁽ᵗ⁾ now δ(o)=1}
\State %%%%45%%%%
\State Initialize workflow template W₀, memory ℳ←∅
\For{iter=1…R}
\State Select a partial workflow W via MCTS (balances exploitation/exploration)
\State Expand W by inserting/modifying nodes using operators in O⁽ᵗ⁾
\State Execute candidate W on 𝒞_val:
Initialize ℳ←∅;
for k=1…|nodes|: o_k=selected operator; h_k←o_k(input_k,P_k,ℳ); ℳ←ℳ∪\{h_k\}
Compute score s=G(W,𝒞_val)
\State Backpropagate s in MCTS
\EndFor
\State Return best workflow W*;
\end{algorithmic}
\end{algorithm} |
6. Significance and Context
A²Flow’s methodological contributions center on the full automation of workflow code block induction, deep abstraction, and integration. Its self-adaptive abstraction operators are derived without any hand-crafted definitions or templates. The combination of operator induction, semantic clustering, chain-of-thought abstraction, and memory-augmented search constitutes an end-to-end pipeline that yields improved generality, resource efficiency, and task performance, as reflected by results on diverse benchmarks (Zhao et al., 23 Nov 2025). A²Flow represents a scalable alternative to manual operator engineering, with empirical evidence demonstrating robust transfer and adaptability across domains and agentic task settings.