A²Flow Operator Induction
- The paper introduces a fully automated framework that induces self-adaptive abstraction operators from expert demonstrations, eliminating manual operator design.
- It clusters and refines operators using LLM embeddings and chain-of-thought prompting to synthesize coherent, multi-step workflows.
- Empirical results show improved efficiency and performance across diverse benchmarks, with significant resource reductions and enhanced task execution.
A²Flow Operator Induction is a fully automated framework for agentic workflow generation based on self-adaptive abstraction operators. This mechanism moves beyond prior methods that rely on manually predefined operators by automatically inducing, abstracting, and integrating reusable operator blocks from expert demonstrations, leveraging LLM reasoning throughout. The central objective is to construct efficient, generalizable workflows for complex tasks through data-driven operator synthesis, abstraction, and search, eliminating the need for hand-crafted, low-level primitives (Zhao et al., 23 Nov 2025).
1. Self-Adaptive Abstraction Operators: Definition and Formalization
In A²Flow, a self-adaptive abstraction operator is a reusable, LLM-powered code “block” encapsulating recurring subroutines (e.g., “Plan”, “Execute”, “Validate”) within multi-step agentic workflows. Each operator is defined as a Python-like class with a single asynchronous __call__ method, parameterized by the LLM itself. Operators act as black-box transforms, each mapping a single input to a single output. Formally, let denote a set of expert task cases and denote the LLM. Given a prompt template , the case-based initial operator extraction function
yields, for each , a set of code operators where
These initial operators, , populate the operator pool used as nodes in subsequent workflow synthesis and search.
2. The Three-Stage Operator Extraction and Abstraction Cascade
A²Flow induces generalizable operators from raw cases via a pipeline of three refinement stages:
2.1 Case-Based Initial Operator Generation
Expert demonstrations are split into 20% validation and 80% test subsets. For each in the validation set, the LLM is prompted with 0 to extract operators for the case. Each extraction produces Pythonic code blocks (e.g.,
5 ). Each candidate is scored using a pass/fail indicator, 1 if the Python executor returns no errors (otherwise 0), and only those with 2 are retained.
2.2 Operator Clustering and Preliminary Abstraction
This stage reduces redundancy. Each viable operator 3 is embedded into a vector representation 4 via the LLM. K-means clustering solves
5
grouping semantically similar operators. For each cluster 6, a “preliminary abstract operator” 7 is synthesized using an LLM prompt 8 to merge and compress all code in the cluster into a single, well-titled, minimal block. The collection is 9.
2.3 Deep Extraction for Abstract Execution Operators
Each preliminary operator is further abstracted via multi-path, chain-of-thought (CoT) prompting. For 0 chains, iterative CoT refinement is applied:
- Step 1: 1,
- Step 2: 2,
- Step 3: 3,
where 4 is the task instruction and 5 is a prompt to “make it deeper and more general.” Across chains, operators whose self-consistency frequency is at least 6 (commonly 7) are retained, and reflection-driven regeneration ensures correctness (8 for all 9).
3. Operator Memory Mechanism
A²Flow augments workflow search with an operator memory mechanism, diverging from approaches where individual operators only see the immediate predecessor’s output. For workflow node 0, the memory set is recursively extended:
1
with 2 as the complete response from 3. Execution now follows
4
This enables each operator at step 5 to utilize the summarized context of all preceding steps, improving workflow coherence (measured at 6 on MATH via ablation).
4. Experiments and Quantitative Performance
A²Flow is benchmarked across eight datasets covering code, math, QA, and embodied agent tasks: HumanEval, MBPP, GSM8K, MATH, HotpotQA, DROP, ALFWorld, and TextCraft. Used metrics include:
- F7 on HotpotQA, DROP (8),
- pass@1 (fraction of correct first attempts) on HumanEval, MBPP,
- SolveRate (correct/total) on GSM8K, MATH,
- Binary success on ALFWorld, TextCraft.
Relative to AFLOW and other baselines, A²Flow yields a 9 average gain on reading, code, and reasoning tasks, 0 on embodied/game tasks, and a 1 reduction in resource usage. For example, on DROP with GPT-4o-mini, cost per run drops from \$C_i$20.51 while F$C_i$3 increases by $C_i$4.
5. High-Level Induction and Search Pseudocode
The following LaTeX-formatted algorithm encapsulates A²Flow’s operator induction and memory-augmented workflow search procedure:
$C_i$6
6. Significance and Context
A²Flow’s methodological contributions center on the full automation of workflow code block induction, deep abstraction, and integration. Its self-adaptive abstraction operators are derived without any hand-crafted definitions or templates. The combination of operator induction, semantic clustering, chain-of-thought abstraction, and memory-augmented search constitutes an end-to-end pipeline that yields improved generality, resource efficiency, and task performance, as reflected by results on diverse benchmarks (Zhao et al., 23 Nov 2025). A²Flow represents a scalable alternative to manual operator engineering, with empirical evidence demonstrating robust transfer and adaptability across domains and agentic task settings.