DyTopo: Dynamic Multi-Agent LLM Coordination

Updated 4 July 2026

DyTopo is a dynamic multi-agent LLM framework that models iterative problem-solving as a round-based computation graph governed by manager-specified goals.
The approach uses semantic embeddings of agent queries and keys to reconstruct sparse, stage-adaptive communication networks for targeted message routing.
Empirical results show average improvements of +6.2 points in code and mathematical reasoning tasks, with reduced token usage and latency.

DyTopo is a manager-guided multi-agent framework for LLM-based reasoning in which the communication topology is rebuilt at every round rather than fixed for an entire trajectory. It formalizes multi-round collaboration as a Dynamic Computation Graph, $\mathcal{G}=\{G^{(t)}\}_{t=0}^{T-1}$ , where each $G^{(t)}$ is a sparse directed graph induced from the agents’ current needs and offers under a manager-specified round goal. The system was introduced for code generation and mathematical reasoning, on the premise that iterative problem solving is stage-dependent and therefore poorly served by trajectory-wide all-to-all, hub-and-spoke, or fixed sequential communication patterns (Lu et al., 5 Feb 2026).

1. Concept and problem setting

DyTopo addresses a structural weakness in multi-agent LLM systems: most pipelines predefine who can communicate with whom and keep that pattern unchanged across all rounds. The underlying claim is that this design mismatches the temporal structure of reasoning. Early rounds often require broad exploration and decomposition, whereas later rounds require targeted verification, debugging, or final answer assembly. In that setting, dense fixed communication can inject irrelevant context, while sparse fixed communication can block task-critical exchanges (Lu et al., 5 Feb 2026).

The framework therefore treats communication topology as a round-level control variable. Rather than assuming one graph for the whole trajectory, DyTopo reconstructs a new sparse directed graph at each round, conditioned on the manager’s current goal and on each agent’s updated local state. A directed edge $a_j \to a_i$ means that agent $j$ is selected as a provider for agent $i$ in that specific round. The topology is thus stage-adaptive and private-message routing is selective rather than broadcast-based (Lu et al., 5 Feb 2026).

A frequent misconception is that the “topology” in DyTopo refers to topological invariants in the persistent-homology sense. In fact, DyTopo uses “topology” operationally, to denote an evolving communication graph over agents. This differs from persistent-homology-based uses of topology in works such as TopoTxR, which extracts 1D and 2D representative cycles from breast DCE-MRI (Wang et al., 2021), or TopoDiffusionNet, which enforces Betti-number constraints in diffusion-generated masks (Gupta et al., 2024). This suggests that DyTopo belongs to the literature on adaptive communication and graph-structured coordination, rather than TDA-driven geometric modeling.

2. Agent architecture and round-conditioned state

DyTopo contains a set of worker agents,

$\mathcal{A}=\{a_1,\dots,a_N\},$

and a manager meta-agent. Each worker agent $a_i$ has a role description $p_i$ and a local memory $H_i^{(t)}$ . At round $t$ , its local state is

$G^{(t)}$ 0

where $G^{(t)}$ 1 is the manager’s round goal (Lu et al., 5 Feb 2026).

Each worker then performs exactly one forward pass per round, a design the paper calls the Single-Pass Inference constraint. The output is

$G^{(t)}$ 2

The four components are a public message, a private message, a query/need descriptor, and a key/offer descriptor. The query summarizes what the agent currently needs from others; the key summarizes what it can provide. The manager maintains global oversight, revises the round goal, and decides whether the collaboration should halt (Lu et al., 5 Feb 2026).

The paper instantiates different role sets for different task families. For code generation, the system uses Manager, Developer, Researcher, Tester, and Designer. For mathematical reasoning, it uses ProblemParser, Solver, Verifier, and Manager. The prompts are JSON-constrained so that public content, private content, query descriptors, key descriptors, and managerial outputs such as is_complete and next_goal can be extracted deterministically (Lu et al., 5 Feb 2026).

3. Semantic matching and dynamic graph induction

The dynamic graph is induced from the natural-language query and key descriptors. DyTopo uses a fixed semantic encoder

$G^{(t)}$ 3

to embed those descriptors, with

$G^{(t)}$ 4

After $G^{(t)}$ 5-normalization,

$G^{(t)}$ 6

the relevance score for a possible provider-consumer relation $G^{(t)}$ 7 is

$G^{(t)}$ 8

This score measures whether agent $G^{(t)}$ 9’s offer semantically matches agent $a_j \to a_i$ 0’s need (Lu et al., 5 Feb 2026).

The adjacency matrix is obtained by hard thresholding: $a_j \to a_i$ 1 and therefore

$a_j \to a_i$ 2

The incoming-neighbor set is

$a_j \to a_i$ 3

This makes the graph sparse by design, with sparsity controlled primarily by the similarity threshold $a_j \to a_i$ 4 (Lu et al., 5 Feb 2026).

In the reported implementation, the semantic matching engine uses sentence-transformers/all-MiniLM-L6-v2 with embedding dimension $a_j \to a_i$ 5. The framework also includes a maximum in-degree hyperparameter $a_j \to a_i$ 6 in the appendix, although the main method section formalizes routing through the threshold rule above rather than an explicit top- $a_j \to a_i$ 7 selection formula. The LLM backbones are served via vLLM (Lu et al., 5 Feb 2026).

The central mechanism is that descriptors are regenerated every round from updated local memories and a revised round goal. Consequently, the query/key embeddings, relevance matrix, and adjacency matrix all change across rounds. The paper’s examples emphasize that a Developer may initially need algorithmic guidance, later need test cases, and finally require only formatting feedback; the active incoming edges shift accordingly (Lu et al., 5 Feb 2026).

4. Message routing, memory update, and manager control

DyTopo separates communication into a public channel and a private channel. Public messages are visible to the manager and recorded in the global trace. Private messages are routed only along edges of the current graph. Thus, communication topology constrains who receives which private content, but not who produces it: every agent still emits both public and private messages each round (Lu et al., 5 Feb 2026).

Because the graph may be cyclic, DyTopo introduces a deterministic ordering used for prompt construction and reproducibility. If $a_j \to a_i$ 8 is acyclic, a standard topological order $a_j \to a_i$ 9 is computed such that

$j$ 0

with

$j$ 1

If the graph contains cycles, DyTopo uses a greedy cycle-breaking heuristic. For unplaced nodes $j$ 2, the restricted in-degree is

$j$ 3

and the next node appended to the order is

$j$ 4

The paper explicitly states that this ordering is not a within-round causal schedule; it is a deterministic linearization for memory aggregation (Lu et al., 5 Feb 2026).

An agent’s next-round memory is updated as

$j$ 5

where $j$ 6 denotes concatenation and $j$ 7 aggregates routed private messages according to the deterministic order. Each agent therefore sees only its own public output and the private outputs of semantically matched providers, rather than the full communication transcript (Lu et al., 5 Feb 2026).

The manager forms a global state

$j$ 8

then applies a meta-policy

$j$ 9

The halting decision is

$i$ 0

This yields a bi-level control loop: semantic routing among workers at the micro level, and round-goal revision plus early stopping at the macro level (Lu et al., 5 Feb 2026).

5. Benchmarks, quantitative results, and ablations

DyTopo is evaluated on HumanEval and APPS-Competition for code generation, and on MATH-500 plus Omni-MATH for mathematical reasoning, across four backbones: MiMo-V2-Flash, GPT-oss-120B, Llama-3-8B-Instruct, and Qwen3-8B. The paper states that DyTopo is best in all 16 reported dataset-by-backbone settings, with a mean improvement of $i$ 1 points over the strongest non-DyTopo baseline and a reported range from $i$ 2 to $i$ 3 points; the abstract summarizes this as “avg. +6.2” (Lu et al., 5 Feb 2026).

On HumanEval, DyTopo reports accuracies of $i$ 4 with MiMo-V2-Flash, $i$ 5 with GPT-oss-120B, $i$ 6 with Llama-3-8B-Instruct, and $i$ 7 with Qwen3-8B. On APPS-Competition, the corresponding values are $i$ 8, $i$ 9, $\mathcal{A}=\{a_1,\dots,a_N\},$ 0, and $\mathcal{A}=\{a_1,\dots,a_N\},$ 1. On MATH-500, they are $\mathcal{A}=\{a_1,\dots,a_N\},$ 2, $\mathcal{A}=\{a_1,\dots,a_N\},$ 3, $\mathcal{A}=\{a_1,\dots,a_N\},$ 4, and $\mathcal{A}=\{a_1,\dots,a_N\},$ 5. On Omni-MATH, they are $\mathcal{A}=\{a_1,\dots,a_N\},$ 6, $\mathcal{A}=\{a_1,\dots,a_N\},$ 7, $\mathcal{A}=\{a_1,\dots,a_N\},$ 8, and $\mathcal{A}=\{a_1,\dots,a_N\},$ 9. The largest reported gains occur on harder mathematics settings, including $a_i$ 0 on MATH-500 with Llama-3-8B-Instruct and $a_i$ 1 on Omni-MATH with Qwen3-8B (Lu et al., 5 Feb 2026).

The paper also argues that multi-round interaction alone is insufficient. A Random Topology baseline with the same sparsity level as DyTopo can help in some settings but does not improve uniformly. This suggests that the principal gain comes from goal-conditioned semantic routing rather than sparsity or additional rounds per se (Lu et al., 5 Feb 2026).

Several ablations refine that conclusion. In the communication-round study, HumanEval peaks at 5 rounds with $a_i$ 2, whereas Math-500 peaks at 9 rounds with $a_i$ 3. The performance curves are non-monotonic, indicating that too few rounds underuse coordination and too many rounds introduce unnecessary edits or distraction. In the similarity-threshold study, APPS-Competition performs best at

$a_i$ 4

with accuracy $a_i$ 5, while Omni-MATH performs best at

$a_i$ 6

with accuracy $a_i$ 7. Thresholds that are too low create overly dense graphs and irrelevant traffic; thresholds that are too high create overly sparse graphs and block useful exchanges (Lu et al., 5 Feb 2026).

The efficiency analysis on HumanEval with MiMo-V2-Flash is notable because it combines accuracy, token count, and latency. DyTopo reports $a_i$ 8 accuracy, $a_i$ 9 tokens, and $p_i$ 0 seconds, whereas AgentScope reports $p_i$ 1, $p_i$ 2 tokens, and $p_i$ 3 seconds. The paper attributes this to sparse routing and manager-controlled early stopping, which reduce unnecessary communication while preserving strong performance (Lu et al., 5 Feb 2026).

6. Interpretability, limitations, and position within the literature

A distinctive feature of DyTopo is that it yields an explicit coordination trace in the form of evolving round graphs. Because each edge is activated by a textual need/offer match, the graph sequence $p_i$ 4 can be inspected directly. The qualitative HumanEval example in the paper shows a denser exploratory graph in round 1, a verification-focused graph in round 2, and a dependency-minimal graph in round 3. In the palindrome case study, the paper reports a Researcher $p_i$ 5 Developer edge with score $p_i$ 6 in the exploration phase, a Developer $p_i$ 7 Tester edge with score $p_i$ 8 during implementation verification, and a Tester $p_i$ 9 Developer edge with score $H_i^{(t)}$ 0 for feedback-driven correction (Lu et al., 5 Feb 2026).

This makes DyTopo more interpretable than systems whose communication pattern is fixed or fully dense. The evolving graph provides a stage-specific rationale for information flow: who needed what, who could provide it, and why a link existed in that round. A plausible implication is that DyTopo’s contribution is as much about coordination observability as about absolute accuracy, since the graph itself becomes an analyzable artifact of the reasoning process.

The framework also has explicit limitations. Descriptor quality is critical: if query or key summaries are vague or misleading, semantic matching can misroute communication and propagate error. Performance is sensitive to the threshold $H_i^{(t)}$ 1, and the optimal sparsity differs by task. The ideal number of rounds is task-dependent, so a single global communication budget is suboptimal. The routing is not learned end-to-end; it uses a fixed embedding model plus thresholding, which preserves simplicity and interpretability but constrains adaptivity. Finally, graph induction still requires $H_i^{(t)}$ 2 pairwise similarity evaluations each round, and the paper notes privacy or logging concerns because coordination traces and messages may be stored (Lu et al., 5 Feb 2026).

Within the broader research landscape, DyTopo is best understood as a dynamic graph-routing framework for LLM coordination rather than a topological-data-analysis method. Earlier graph-topological work such as “The Landscape of Complex Networks” defines critical nodes, basin structure, and saddles on static graphs with scalar node functions (Weinan et al., 2012). DyTopo instead uses topology as an inference-time communication graph that is reconstructed round by round from semantic descriptors. This suggests that its closest methodological lineage lies in adaptive multi-agent orchestration, sparse communication, and content-based routing, even though the term “topology” gives it a broader conceptual resonance.