Interactive Agents Call Tree (IACT) Model
- IACT is a computational model that replaces rigid, static workflows with dynamic, dialogue-driven agent trees.
- Its recursive CALL/RETURN semantics allow agents to autonomously decompose tasks, reducing error propagation and adapting to complex objectives.
- IACT supports asynchronous execution with speculative tool calling, achieving significant speedups while maintaining high task accuracy.
Searching arXiv for the specified IACT papers and closely related material. Interactive Agents Call Tree (IACT) is a computational model for general AI agents that is designed to address the limitations of static, hard-coded agent workflows. In the technical white paper introducing the architecture behind kragent.ai, IACT is defined as a general-purpose autonomous system driven purely by user dialogue: given a high-level objective, it autonomously grows a dynamic, recursive agent topology incrementally tailored to the problem’s structure, allowing organizational complexity to scale with open-ended tasks. The model replaces rigid invocations with bidirectional, stateful dialogues in order to mitigate the error propagation associated with unidirectional function calls. The white paper emphasizes architecture, design principles, and practical lessons from production deployment, and presents qualitative evidence from real-world workflows rather than exhaustive benchmark results (Lu, 2 Dec 2025). A related real-time formulation casts IACT-style execution as asynchronous I/O with speculative tool calling, extending the architecture toward latency-sensitive interactive settings (Hooper et al., 13 May 2026).
1. Formal specification
The white paper defines an IACT system as the tuple
where is the set of agent-instances that exist at runtime, is the set of directed CALL edges forming a tree rooted at the user’s initial agent, is the agent micro-architecture implementing the perception–action loop, is the context-construction function that builds each agent’s working memory, and is the dialogue protocol enabling stateful, bidirectional messages (Lu, 2 Dec 2025).
At time , the organizational state is a rooted tree
with the invariant that each has a unique parent, except the root, and that grows only when new sub-agents are spawned. The macro-architecture is organized around a user-to-root-agent interface, recursive CALL/RETURN edges, downward state flow via input arguments, and upward state flow via RETURN(result) messages. The root receives the high-level objective 0, and each parent agent 1 may invoke a child agent 2 by issuing CALL(o_c); the edge 3 appears in 4 on first invocation.
This formalization distinguishes IACT from static workflow systems that require pre-defined graphs or specialized programming. The operative unit is not a fixed pipeline stage but a runtime agent-instance whose place in the topology is created only when the current problem structure calls for it. The result is a call-tree semantics in which decomposition is endogenous to the agent’s own reasoning rather than imposed externally.
2. Recursive organization and execution semantics
The central organizing principle of IACT is self-organizing recursion. The tree grows on demand via two primitives, CALL and RETURN. In the execution schema given in the white paper, the system initializes a root agent with the user objective, repeatedly builds context, queries the LLM, interprets the generated output, and either spawns or reuses child agents when CALL(o) appears, or terminates a branch when RETURN(r) appears (Lu, 2 Dec 2025).
Task decomposition is delegated to each agent’s local decision process. An agent takes its current objective or intermediate result and determines whether the task is atomic; if so, it finishes and returns. Otherwise it decomposes the task into sub-objectives 5 and issues CALL(o_i) for each. This yields a minimal on-demand tree: children are spawned only when the parent’s internal reasoning declares a CALL, and subsequent calls to the same objective reuse the existing node and continue the dialogue.
The micro-architecture of a node is specified by the context
6
followed by the LLM output
7
and the interpreter result
8
Here, 9 is the static system prompt defining the role, such as “You are a Researcher”; 0 is the dialogue history with parent and child agents; 1 is the latest incoming message from parent or tool; and 2 contains dynamic injections such as context warnings or file listings. The LLM output may include natural-language instructions, structured code, or primitives such as CALL, RETURN, and CONTEXT-COMPRESS. The interpreter executes embedded actions, including tool calls, CALL/RETURN, and variable definitions, and produces feedback that becomes part of the next context.
A useful implication of this design is that topology is not merely a control-flow representation; it is a computational state that co-evolves with reasoning. The tree records decomposition choices made at runtime, and therefore functions simultaneously as execution structure and problem-specific organizational memory.
3. Stateful bidirectional dialogue and error correction
A defining feature of IACT is what the white paper calls interactional redundancy. Instead of treating CALL as a one-shot function invocation, every CALL becomes a stateful dialogue session between parent and child agents. For parent agent 3 and child agent 4 at turn 5, the exchanged messages are denoted by
6
and
7
and each side updates its internal context using the incoming message (Lu, 2 Dec 2025).
The paper formalizes this as
8
and also at the level of agent state as
9
Each state 0 incorporates both private memory and the dialogue trace, while the functions 1 are implemented by the BuildContext and Interpreter machinery.
The error-correction mechanism is explicitly recursive. If a child’s output is flawed, the parent inspects it and emits a corrective message:
2
The loop continues until the parent’s condition Check(e_t)=OK holds. The paper characterizes this as minimizing
3
subject to
4
This formulation is intended to mitigate the error propagation inherent in unidirectional function calls. In that sense, IACT rejects the assumption that hierarchical decomposition must be strictly feed-forward. A common misunderstanding is to treat the call tree as a conventional procedure tree; in IACT, the tree is coupled to iterative dialogue, correction, and ambiguity resolution, not merely delegation.
4. Runtime components, memory, and message infrastructure
The white paper identifies five key components in the architecture. Their roles are summarized below (Lu, 2 Dec 2025).
| Component | Function | Notes |
|---|---|---|
| LLM engine (“brain”) | Generates next action or response | Produces language, code, and primitives |
| Hybrid Language Interpreter (“executor”) | Parses and runs actions | Executes tool calls and control primitives |
| Ext-Modules | Environment interaction via RPC | External tools |
| Hippocampus | Global associative memory | Vector DB |
| Unified Extended-Markdown protocol | Common message layer | Used for all internal and external messages |
These components operationalize the split between generation, execution, environment access, memory, and communication. The interpreter is especially important because the LLM output is not restricted to plain text; it may contain structured code and control primitives. The system therefore depends on a mediating layer that can parse, execute, and feed results back into context.
The production notes reported for kragent.ai focus on three recurrent systems issues. First, the “Multi-Party Dialogue Gap” arose when agents initially conflated tool outputs, parent messages, and child messages; the reported solution was to append invisible system-notes clarifying the audience of each message, after which modern LLMs were said to track three interlocutors—User/Parent, Tools, and Sub-Agents. Second, under uncertainty, current LLMs were observed to prefer hallucinating missing constraints rather than querying for clarification, motivating exploration of RL-fine-tuning for proactive QUERY_PARENT calls. Third, the Hippocampus vector store was reported to retain long-term project facts but sometimes return stale items, leading to experiments with a specialized LLM for memory consolidation (Lu, 2 Dec 2025).
The same deployment notes describe a token-efficiency strategy in which zero-copy data passing via the Symbolic Variable Mechanism ensures 5 communication overhead regardless of file size, while Dynamic Context Injection and a KV-cache–friendly prefix structure keep average agent-turn latency low. The paper states that even for 100k+-token workflows, structural overhead remains in the 1–2k-token range, fully amortized across long subtasks. Security and sandboxing are handled by running Ext-Modules in root-inside/zero-trust-outside containers, with secrets excluded from LLM context through proxy execution or ephemeral in-memory web apps.
5. Workflow dynamics and operational examples
The white paper’s canonical example is the user request: “Generate a three-chapter tutorial on IACT, including code examples.” The root agent, acting as a Planner, begins with the objective “Write 3-chapter tutorial on IACT,” produces an outline-oriented action, and issues CALL(OutlineGenerator). The OutlineGenerator returns a three-part outline, after which the parent issues CALL(ChapterWriter_1) with the first outline segment. A chapter-writing agent may then interleave drafting with CALL(CodeGenerator), engage in a short dialogue to refine a code snippet, integrate the result, and return chapter text. The root repeats this pattern for subsequent chapters and finally issues CALL(Editor) for polishing and PDF compilation via a LaTeX toolchain module (Lu, 2 Dec 2025).
This example illustrates three behaviors emphasized by the paper: recursive spawning of specialized agents, bidirectional dialogues for error checking and refinement, and final assembly by an editor-like agent that invokes tools. The state evolution of a writer and code generator is again expressed through coupled update functions, underscoring that sub-agents do not simply return values once but may participate in iterative local negotiation before their branch terminates.
The example is also significant because it demonstrates the intended granularity of decomposition. Sub-agents are not restricted to coarse task partitions such as chapter-level assignment; they may appear transiently within a chapter to handle subordinate concerns such as code generation. This suggests that the call tree is meant to represent nested epistemic and execution dependencies rather than only managerial task allocation.
6. Asynchronous and real-time variants
A related technical blueprint presents an IACT design in the style of “Building Interactive Real-Time Agents with Asynchronous I/O and Speculative Tool Calling,” extending the architecture to latency-sensitive interactive settings. The core design goal is to decouple the LLM’s think-and-act loop from blocking waits on user input and tool responses so that computation and I/O overlap. The blueprint describes a nonblocking event-driven system with a shared EventQueue for partial user updates and tool responses, a TaskQueue for speculative and committed tool calls, a core reasoner loop, and a parallel I/O worker that executes tool calls asynchronously and reinjects responses as events (Hooper et al., 13 May 2026).
The latency model is formalized by defining total model-thinking time
6
and total tool or user-wait latency
7
In a synchronous pipeline,
8
whereas under ideal overlap,
9
The corresponding speedup is
0
For partial overlap, the blueprint gives
1
with
2
The same formulation introduces Speculative Tool Calling using an evolving DAG call-tree 3 whose nodes carry an identifier, tool, arguments, a state in 4, and dependencies. Safe read-only tools may execute immediately in speculative mode, while unsafe tools are held until commit. Speculative nodes may be cancelled when user updates contradict arguments or when the LLM issues <REMOVE ID>. Commit occurs when the final user update has arrived and the LLM either issues a new tool identifier larger than the current maximum or emits a <pause> action (Hooper et al., 13 May 2026).
This real-time blueprint differs from the original white paper in one notable formal respect. The original IACT definition specifies a rooted tree of CALL edges, whereas the asynchronous design maintains an evolving DAG call-tree to support speculation, cancellation, and dependency tracking. This suggests a broader family of IACT-like execution structures in which the tree abstraction remains central but may be relaxed for latency-oriented runtime control.
The blueprint also describes a clock-based training methodology for asynchronous inference. A virtual clock 5 is measured in tokens of generation time; each user-segment arrival or tool latency 6 is converted into a token delay 7; and the model is trained to interleave generation with event injection according to a sorted event queue. The synthetic data pipeline begins from fully annotated multi-turn tool workflows such as TinyAgent and HotpotQA, splits user queries into streaming segments via TTS using Kokoro-82M and forced alignment using WhisperX, identifies the earliest segment at which a strong LLM, specified as Qwen-32B, can produce the correct tool call with no missing arguments, and then builds a step-by-step trajectory of context, reasoning, and action. The loss is standard cross-entropy 8, optionally augmented by a timing penalty
9
Reported evaluation results compare a synchronous Reason-and-Act baseline with AsyncIO plus Speculative Tool Calling labeled “IACT” across OpenAI Realtime API, Qwen2.5-3B-Instruct, and Llama-3.2-3B-Instruct (Hooper et al., 13 May 2026).
| Setting | HotpotQA | TinyAgent |
|---|---|---|
| openai-realtime-1.5 baseline (N=500) | 71.6%, 4.5 s | 54.9%, 7.6 s |
| openai-realtime-1.5 IACT (N=500) | 71.0%, 3.6 s (0) | 53.2%, 4.4 s (1) |
| Qwen2.5-3B baseline (N=100) | 68.6%, 2.7 s | 65.6%, 4.1 s |
| Qwen2.5-3B IACT (N=100) | 67.5%, 1.2 s (2) | 62.1%, 2.5 s (3) |
| Llama-3.2-3B baseline (N=100) | 70.4%, 2.3 s | 66.8%, 5.0 s |
| Llama-3.2-3B IACT (N=100) | 68.7%, 1.1 s (4) | 65.2%, 2.5 s (5) |
The reported aggregate trade-off is a speedup in the range 6 at a cost of at most 1.5 percentage points in absolute accuracy difference, with the specific claim that on HotpotQA with Llama-3.2-3B, end-to-end latency reaches 1.1 s. In this formulation, IACT becomes not only a recursive organizational model but also a systems pattern for overlapping reasoning with external delays while preserving most of the original task accuracy.