Papers
Topics
Authors
Recent
Search
2000 character limit reached

Interactive Agents Call Tree (IACT) Model

Updated 4 July 2026
  • IACT is a computational model that replaces rigid, static workflows with dynamic, dialogue-driven agent trees.
  • Its recursive CALL/RETURN semantics allow agents to autonomously decompose tasks, reducing error propagation and adapting to complex objectives.
  • IACT supports asynchronous execution with speculative tool calling, achieving significant speedups while maintaining high task accuracy.

Searching arXiv for the specified IACT papers and closely related material. Interactive Agents Call Tree (IACT) is a computational model for general AI agents that is designed to address the limitations of static, hard-coded agent workflows. In the technical white paper introducing the architecture behind kragent.ai, IACT is defined as a general-purpose autonomous system driven purely by user dialogue: given a high-level objective, it autonomously grows a dynamic, recursive agent topology incrementally tailored to the problem’s structure, allowing organizational complexity to scale with open-ended tasks. The model replaces rigid invocations with bidirectional, stateful dialogues in order to mitigate the error propagation associated with unidirectional function calls. The white paper emphasizes architecture, design principles, and practical lessons from production deployment, and presents qualitative evidence from real-world workflows rather than exhaustive benchmark results (Lu, 2 Dec 2025). A related real-time formulation casts IACT-style execution as asynchronous I/O with speculative tool calling, extending the architecture toward latency-sensitive interactive settings (Hooper et al., 13 May 2026).

1. Formal specification

The white paper defines an IACT system as the tuple

IACT=(V,E,A,C,D),IACT = (V, E, A, C, D),

where VV is the set of agent-instances that exist at runtime, EV×VE \subseteq V \times V is the set of directed CALL edges forming a tree rooted at the user’s initial agent, AA is the agent micro-architecture implementing the perception–action loop, CC is the context-construction function that builds each agent’s working memory, and DD is the dialogue protocol enabling stateful, bidirectional messages (Lu, 2 Dec 2025).

At time tt, the organizational state is a rooted tree

Tt=(Vt,Et),T_t = (V_t, E_t),

with the invariant that each vVtv \in V_t has a unique parent, except the root, and that EtE_t grows only when new sub-agents are spawned. The macro-architecture is organized around a user-to-root-agent interface, recursive CALL/RETURN edges, downward state flow via input arguments, and upward state flow via RETURN(result) messages. The root receives the high-level objective VV0, and each parent agent VV1 may invoke a child agent VV2 by issuing CALL(o_c); the edge VV3 appears in VV4 on first invocation.

This formalization distinguishes IACT from static workflow systems that require pre-defined graphs or specialized programming. The operative unit is not a fixed pipeline stage but a runtime agent-instance whose place in the topology is created only when the current problem structure calls for it. The result is a call-tree semantics in which decomposition is endogenous to the agent’s own reasoning rather than imposed externally.

2. Recursive organization and execution semantics

The central organizing principle of IACT is self-organizing recursion. The tree grows on demand via two primitives, CALL and RETURN. In the execution schema given in the white paper, the system initializes a root agent with the user objective, repeatedly builds context, queries the LLM, interprets the generated output, and either spawns or reuses child agents when CALL(o) appears, or terminates a branch when RETURN(r) appears (Lu, 2 Dec 2025).

Task decomposition is delegated to each agent’s local decision process. An agent takes its current objective or intermediate result and determines whether the task is atomic; if so, it finishes and returns. Otherwise it decomposes the task into sub-objectives VV5 and issues CALL(o_i) for each. This yields a minimal on-demand tree: children are spawned only when the parent’s internal reasoning declares a CALL, and subsequent calls to the same objective reuse the existing node and continue the dialogue.

The micro-architecture of a node is specified by the context

VV6

followed by the LLM output

VV7

and the interpreter result

VV8

Here, VV9 is the static system prompt defining the role, such as “You are a Researcher”; EV×VE \subseteq V \times V0 is the dialogue history with parent and child agents; EV×VE \subseteq V \times V1 is the latest incoming message from parent or tool; and EV×VE \subseteq V \times V2 contains dynamic injections such as context warnings or file listings. The LLM output may include natural-language instructions, structured code, or primitives such as CALL, RETURN, and CONTEXT-COMPRESS. The interpreter executes embedded actions, including tool calls, CALL/RETURN, and variable definitions, and produces feedback that becomes part of the next context.

A useful implication of this design is that topology is not merely a control-flow representation; it is a computational state that co-evolves with reasoning. The tree records decomposition choices made at runtime, and therefore functions simultaneously as execution structure and problem-specific organizational memory.

3. Stateful bidirectional dialogue and error correction

A defining feature of IACT is what the white paper calls interactional redundancy. Instead of treating CALL as a one-shot function invocation, every CALL becomes a stateful dialogue session between parent and child agents. For parent agent EV×VE \subseteq V \times V3 and child agent EV×VE \subseteq V \times V4 at turn EV×VE \subseteq V \times V5, the exchanged messages are denoted by

EV×VE \subseteq V \times V6

and

EV×VE \subseteq V \times V7

and each side updates its internal context using the incoming message (Lu, 2 Dec 2025).

The paper formalizes this as

EV×VE \subseteq V \times V8

and also at the level of agent state as

EV×VE \subseteq V \times V9

Each state AA0 incorporates both private memory and the dialogue trace, while the functions AA1 are implemented by the BuildContext and Interpreter machinery.

The error-correction mechanism is explicitly recursive. If a child’s output is flawed, the parent inspects it and emits a corrective message:

AA2

The loop continues until the parent’s condition Check(e_t)=OK holds. The paper characterizes this as minimizing

AA3

subject to

AA4

This formulation is intended to mitigate the error propagation inherent in unidirectional function calls. In that sense, IACT rejects the assumption that hierarchical decomposition must be strictly feed-forward. A common misunderstanding is to treat the call tree as a conventional procedure tree; in IACT, the tree is coupled to iterative dialogue, correction, and ambiguity resolution, not merely delegation.

4. Runtime components, memory, and message infrastructure

The white paper identifies five key components in the architecture. Their roles are summarized below (Lu, 2 Dec 2025).

Component Function Notes
LLM engine (“brain”) Generates next action or response Produces language, code, and primitives
Hybrid Language Interpreter (“executor”) Parses and runs actions Executes tool calls and control primitives
Ext-Modules Environment interaction via RPC External tools
Hippocampus Global associative memory Vector DB
Unified Extended-Markdown protocol Common message layer Used for all internal and external messages

These components operationalize the split between generation, execution, environment access, memory, and communication. The interpreter is especially important because the LLM output is not restricted to plain text; it may contain structured code and control primitives. The system therefore depends on a mediating layer that can parse, execute, and feed results back into context.

The production notes reported for kragent.ai focus on three recurrent systems issues. First, the “Multi-Party Dialogue Gap” arose when agents initially conflated tool outputs, parent messages, and child messages; the reported solution was to append invisible system-notes clarifying the audience of each message, after which modern LLMs were said to track three interlocutors—User/Parent, Tools, and Sub-Agents. Second, under uncertainty, current LLMs were observed to prefer hallucinating missing constraints rather than querying for clarification, motivating exploration of RL-fine-tuning for proactive QUERY_PARENT calls. Third, the Hippocampus vector store was reported to retain long-term project facts but sometimes return stale items, leading to experiments with a specialized LLM for memory consolidation (Lu, 2 Dec 2025).

The same deployment notes describe a token-efficiency strategy in which zero-copy data passing via the Symbolic Variable Mechanism ensures AA5 communication overhead regardless of file size, while Dynamic Context Injection and a KV-cache–friendly prefix structure keep average agent-turn latency low. The paper states that even for 100k+-token workflows, structural overhead remains in the 1–2k-token range, fully amortized across long subtasks. Security and sandboxing are handled by running Ext-Modules in root-inside/zero-trust-outside containers, with secrets excluded from LLM context through proxy execution or ephemeral in-memory web apps.

5. Workflow dynamics and operational examples

The white paper’s canonical example is the user request: “Generate a three-chapter tutorial on IACT, including code examples.” The root agent, acting as a Planner, begins with the objective “Write 3-chapter tutorial on IACT,” produces an outline-oriented action, and issues CALL(OutlineGenerator). The OutlineGenerator returns a three-part outline, after which the parent issues CALL(ChapterWriter_1) with the first outline segment. A chapter-writing agent may then interleave drafting with CALL(CodeGenerator), engage in a short dialogue to refine a code snippet, integrate the result, and return chapter text. The root repeats this pattern for subsequent chapters and finally issues CALL(Editor) for polishing and PDF compilation via a LaTeX toolchain module (Lu, 2 Dec 2025).

This example illustrates three behaviors emphasized by the paper: recursive spawning of specialized agents, bidirectional dialogues for error checking and refinement, and final assembly by an editor-like agent that invokes tools. The state evolution of a writer and code generator is again expressed through coupled update functions, underscoring that sub-agents do not simply return values once but may participate in iterative local negotiation before their branch terminates.

The example is also significant because it demonstrates the intended granularity of decomposition. Sub-agents are not restricted to coarse task partitions such as chapter-level assignment; they may appear transiently within a chapter to handle subordinate concerns such as code generation. This suggests that the call tree is meant to represent nested epistemic and execution dependencies rather than only managerial task allocation.

6. Asynchronous and real-time variants

A related technical blueprint presents an IACT design in the style of “Building Interactive Real-Time Agents with Asynchronous I/O and Speculative Tool Calling,” extending the architecture to latency-sensitive interactive settings. The core design goal is to decouple the LLM’s think-and-act loop from blocking waits on user input and tool responses so that computation and I/O overlap. The blueprint describes a nonblocking event-driven system with a shared EventQueue for partial user updates and tool responses, a TaskQueue for speculative and committed tool calls, a core reasoner loop, and a parallel I/O worker that executes tool calls asynchronously and reinjects responses as events (Hooper et al., 13 May 2026).

The latency model is formalized by defining total model-thinking time

AA6

and total tool or user-wait latency

AA7

In a synchronous pipeline,

AA8

whereas under ideal overlap,

AA9

The corresponding speedup is

CC0

For partial overlap, the blueprint gives

CC1

with

CC2

The same formulation introduces Speculative Tool Calling using an evolving DAG call-tree CC3 whose nodes carry an identifier, tool, arguments, a state in CC4, and dependencies. Safe read-only tools may execute immediately in speculative mode, while unsafe tools are held until commit. Speculative nodes may be cancelled when user updates contradict arguments or when the LLM issues <REMOVE ID>. Commit occurs when the final user update has arrived and the LLM either issues a new tool identifier larger than the current maximum or emits a <pause> action (Hooper et al., 13 May 2026).

This real-time blueprint differs from the original white paper in one notable formal respect. The original IACT definition specifies a rooted tree of CALL edges, whereas the asynchronous design maintains an evolving DAG call-tree to support speculation, cancellation, and dependency tracking. This suggests a broader family of IACT-like execution structures in which the tree abstraction remains central but may be relaxed for latency-oriented runtime control.

The blueprint also describes a clock-based training methodology for asynchronous inference. A virtual clock CC5 is measured in tokens of generation time; each user-segment arrival or tool latency CC6 is converted into a token delay CC7; and the model is trained to interleave generation with event injection according to a sorted event queue. The synthetic data pipeline begins from fully annotated multi-turn tool workflows such as TinyAgent and HotpotQA, splits user queries into streaming segments via TTS using Kokoro-82M and forced alignment using WhisperX, identifies the earliest segment at which a strong LLM, specified as Qwen-32B, can produce the correct tool call with no missing arguments, and then builds a step-by-step trajectory of context, reasoning, and action. The loss is standard cross-entropy CC8, optionally augmented by a timing penalty

CC9

Reported evaluation results compare a synchronous Reason-and-Act baseline with AsyncIO plus Speculative Tool Calling labeled “IACT” across OpenAI Realtime API, Qwen2.5-3B-Instruct, and Llama-3.2-3B-Instruct (Hooper et al., 13 May 2026).

Setting HotpotQA TinyAgent
openai-realtime-1.5 baseline (N=500) 71.6%, 4.5 s 54.9%, 7.6 s
openai-realtime-1.5 IACT (N=500) 71.0%, 3.6 s (DD0) 53.2%, 4.4 s (DD1)
Qwen2.5-3B baseline (N=100) 68.6%, 2.7 s 65.6%, 4.1 s
Qwen2.5-3B IACT (N=100) 67.5%, 1.2 s (DD2) 62.1%, 2.5 s (DD3)
Llama-3.2-3B baseline (N=100) 70.4%, 2.3 s 66.8%, 5.0 s
Llama-3.2-3B IACT (N=100) 68.7%, 1.1 s (DD4) 65.2%, 2.5 s (DD5)

The reported aggregate trade-off is a speedup in the range DD6 at a cost of at most 1.5 percentage points in absolute accuracy difference, with the specific claim that on HotpotQA with Llama-3.2-3B, end-to-end latency reaches 1.1 s. In this formulation, IACT becomes not only a recursive organizational model but also a systems pattern for overlapping reasoning with external delays while preserving most of the original task accuracy.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Interactive Agents Call Tree (IACT).