Papers
Topics
Authors
Recent
Search
2000 character limit reached

DyTopo: Dynamic Multi-Agent LLM Coordination

Updated 4 July 2026
  • DyTopo is a dynamic multi-agent LLM framework that models iterative problem-solving as a round-based computation graph governed by manager-specified goals.
  • The approach uses semantic embeddings of agent queries and keys to reconstruct sparse, stage-adaptive communication networks for targeted message routing.
  • Empirical results show average improvements of +6.2 points in code and mathematical reasoning tasks, with reduced token usage and latency.

DyTopo is a manager-guided multi-agent framework for LLM-based reasoning in which the communication topology is rebuilt at every round rather than fixed for an entire trajectory. It formalizes multi-round collaboration as a Dynamic Computation Graph, G={G(t)}t=0T1\mathcal{G}=\{G^{(t)}\}_{t=0}^{T-1}, where each G(t)G^{(t)} is a sparse directed graph induced from the agents’ current needs and offers under a manager-specified round goal. The system was introduced for code generation and mathematical reasoning, on the premise that iterative problem solving is stage-dependent and therefore poorly served by trajectory-wide all-to-all, hub-and-spoke, or fixed sequential communication patterns (Lu et al., 5 Feb 2026).

1. Concept and problem setting

DyTopo addresses a structural weakness in multi-agent LLM systems: most pipelines predefine who can communicate with whom and keep that pattern unchanged across all rounds. The underlying claim is that this design mismatches the temporal structure of reasoning. Early rounds often require broad exploration and decomposition, whereas later rounds require targeted verification, debugging, or final answer assembly. In that setting, dense fixed communication can inject irrelevant context, while sparse fixed communication can block task-critical exchanges (Lu et al., 5 Feb 2026).

The framework therefore treats communication topology as a round-level control variable. Rather than assuming one graph for the whole trajectory, DyTopo reconstructs a new sparse directed graph at each round, conditioned on the manager’s current goal and on each agent’s updated local state. A directed edge ajaia_j \to a_i means that agent jj is selected as a provider for agent ii in that specific round. The topology is thus stage-adaptive and private-message routing is selective rather than broadcast-based (Lu et al., 5 Feb 2026).

A frequent misconception is that the “topology” in DyTopo refers to topological invariants in the persistent-homology sense. In fact, DyTopo uses “topology” operationally, to denote an evolving communication graph over agents. This differs from persistent-homology-based uses of topology in works such as TopoTxR, which extracts 1D and 2D representative cycles from breast DCE-MRI (Wang et al., 2021), or TopoDiffusionNet, which enforces Betti-number constraints in diffusion-generated masks (Gupta et al., 2024). This suggests that DyTopo belongs to the literature on adaptive communication and graph-structured coordination, rather than TDA-driven geometric modeling.

2. Agent architecture and round-conditioned state

DyTopo contains a set of worker agents,

A={a1,,aN},\mathcal{A}=\{a_1,\dots,a_N\},

and a manager meta-agent. Each worker agent aia_i has a role description pip_i and a local memory Hi(t)H_i^{(t)}. At round tt, its local state is

G(t)G^{(t)}0

where G(t)G^{(t)}1 is the manager’s round goal (Lu et al., 5 Feb 2026).

Each worker then performs exactly one forward pass per round, a design the paper calls the Single-Pass Inference constraint. The output is

G(t)G^{(t)}2

The four components are a public message, a private message, a query/need descriptor, and a key/offer descriptor. The query summarizes what the agent currently needs from others; the key summarizes what it can provide. The manager maintains global oversight, revises the round goal, and decides whether the collaboration should halt (Lu et al., 5 Feb 2026).

The paper instantiates different role sets for different task families. For code generation, the system uses Manager, Developer, Researcher, Tester, and Designer. For mathematical reasoning, it uses ProblemParser, Solver, Verifier, and Manager. The prompts are JSON-constrained so that public content, private content, query descriptors, key descriptors, and managerial outputs such as is_complete and next_goal can be extracted deterministically (Lu et al., 5 Feb 2026).

3. Semantic matching and dynamic graph induction

The dynamic graph is induced from the natural-language query and key descriptors. DyTopo uses a fixed semantic encoder

G(t)G^{(t)}3

to embed those descriptors, with

G(t)G^{(t)}4

After G(t)G^{(t)}5-normalization,

G(t)G^{(t)}6

the relevance score for a possible provider-consumer relation G(t)G^{(t)}7 is

G(t)G^{(t)}8

This score measures whether agent G(t)G^{(t)}9’s offer semantically matches agent ajaia_j \to a_i0’s need (Lu et al., 5 Feb 2026).

The adjacency matrix is obtained by hard thresholding: ajaia_j \to a_i1 and therefore

ajaia_j \to a_i2

The incoming-neighbor set is

ajaia_j \to a_i3

This makes the graph sparse by design, with sparsity controlled primarily by the similarity threshold ajaia_j \to a_i4 (Lu et al., 5 Feb 2026).

In the reported implementation, the semantic matching engine uses sentence-transformers/all-MiniLM-L6-v2 with embedding dimension ajaia_j \to a_i5. The framework also includes a maximum in-degree hyperparameter ajaia_j \to a_i6 in the appendix, although the main method section formalizes routing through the threshold rule above rather than an explicit top-ajaia_j \to a_i7 selection formula. The LLM backbones are served via vLLM (Lu et al., 5 Feb 2026).

The central mechanism is that descriptors are regenerated every round from updated local memories and a revised round goal. Consequently, the query/key embeddings, relevance matrix, and adjacency matrix all change across rounds. The paper’s examples emphasize that a Developer may initially need algorithmic guidance, later need test cases, and finally require only formatting feedback; the active incoming edges shift accordingly (Lu et al., 5 Feb 2026).

4. Message routing, memory update, and manager control

DyTopo separates communication into a public channel and a private channel. Public messages are visible to the manager and recorded in the global trace. Private messages are routed only along edges of the current graph. Thus, communication topology constrains who receives which private content, but not who produces it: every agent still emits both public and private messages each round (Lu et al., 5 Feb 2026).

Because the graph may be cyclic, DyTopo introduces a deterministic ordering used for prompt construction and reproducibility. If ajaia_j \to a_i8 is acyclic, a standard topological order ajaia_j \to a_i9 is computed such that

jj0

with

jj1

If the graph contains cycles, DyTopo uses a greedy cycle-breaking heuristic. For unplaced nodes jj2, the restricted in-degree is

jj3

and the next node appended to the order is

jj4

The paper explicitly states that this ordering is not a within-round causal schedule; it is a deterministic linearization for memory aggregation (Lu et al., 5 Feb 2026).

An agent’s next-round memory is updated as

jj5

where jj6 denotes concatenation and jj7 aggregates routed private messages according to the deterministic order. Each agent therefore sees only its own public output and the private outputs of semantically matched providers, rather than the full communication transcript (Lu et al., 5 Feb 2026).

The manager forms a global state

jj8

then applies a meta-policy

jj9

The halting decision is

ii0

This yields a bi-level control loop: semantic routing among workers at the micro level, and round-goal revision plus early stopping at the macro level (Lu et al., 5 Feb 2026).

5. Benchmarks, quantitative results, and ablations

DyTopo is evaluated on HumanEval and APPS-Competition for code generation, and on MATH-500 plus Omni-MATH for mathematical reasoning, across four backbones: MiMo-V2-Flash, GPT-oss-120B, Llama-3-8B-Instruct, and Qwen3-8B. The paper states that DyTopo is best in all 16 reported dataset-by-backbone settings, with a mean improvement of ii1 points over the strongest non-DyTopo baseline and a reported range from ii2 to ii3 points; the abstract summarizes this as “avg. +6.2” (Lu et al., 5 Feb 2026).

On HumanEval, DyTopo reports accuracies of ii4 with MiMo-V2-Flash, ii5 with GPT-oss-120B, ii6 with Llama-3-8B-Instruct, and ii7 with Qwen3-8B. On APPS-Competition, the corresponding values are ii8, ii9, A={a1,,aN},\mathcal{A}=\{a_1,\dots,a_N\},0, and A={a1,,aN},\mathcal{A}=\{a_1,\dots,a_N\},1. On MATH-500, they are A={a1,,aN},\mathcal{A}=\{a_1,\dots,a_N\},2, A={a1,,aN},\mathcal{A}=\{a_1,\dots,a_N\},3, A={a1,,aN},\mathcal{A}=\{a_1,\dots,a_N\},4, and A={a1,,aN},\mathcal{A}=\{a_1,\dots,a_N\},5. On Omni-MATH, they are A={a1,,aN},\mathcal{A}=\{a_1,\dots,a_N\},6, A={a1,,aN},\mathcal{A}=\{a_1,\dots,a_N\},7, A={a1,,aN},\mathcal{A}=\{a_1,\dots,a_N\},8, and A={a1,,aN},\mathcal{A}=\{a_1,\dots,a_N\},9. The largest reported gains occur on harder mathematics settings, including aia_i0 on MATH-500 with Llama-3-8B-Instruct and aia_i1 on Omni-MATH with Qwen3-8B (Lu et al., 5 Feb 2026).

The paper also argues that multi-round interaction alone is insufficient. A Random Topology baseline with the same sparsity level as DyTopo can help in some settings but does not improve uniformly. This suggests that the principal gain comes from goal-conditioned semantic routing rather than sparsity or additional rounds per se (Lu et al., 5 Feb 2026).

Several ablations refine that conclusion. In the communication-round study, HumanEval peaks at 5 rounds with aia_i2, whereas Math-500 peaks at 9 rounds with aia_i3. The performance curves are non-monotonic, indicating that too few rounds underuse coordination and too many rounds introduce unnecessary edits or distraction. In the similarity-threshold study, APPS-Competition performs best at

aia_i4

with accuracy aia_i5, while Omni-MATH performs best at

aia_i6

with accuracy aia_i7. Thresholds that are too low create overly dense graphs and irrelevant traffic; thresholds that are too high create overly sparse graphs and block useful exchanges (Lu et al., 5 Feb 2026).

The efficiency analysis on HumanEval with MiMo-V2-Flash is notable because it combines accuracy, token count, and latency. DyTopo reports aia_i8 accuracy, aia_i9 tokens, and pip_i0 seconds, whereas AgentScope reports pip_i1, pip_i2 tokens, and pip_i3 seconds. The paper attributes this to sparse routing and manager-controlled early stopping, which reduce unnecessary communication while preserving strong performance (Lu et al., 5 Feb 2026).

6. Interpretability, limitations, and position within the literature

A distinctive feature of DyTopo is that it yields an explicit coordination trace in the form of evolving round graphs. Because each edge is activated by a textual need/offer match, the graph sequence pip_i4 can be inspected directly. The qualitative HumanEval example in the paper shows a denser exploratory graph in round 1, a verification-focused graph in round 2, and a dependency-minimal graph in round 3. In the palindrome case study, the paper reports a Researcher pip_i5 Developer edge with score pip_i6 in the exploration phase, a Developer pip_i7 Tester edge with score pip_i8 during implementation verification, and a Tester pip_i9 Developer edge with score Hi(t)H_i^{(t)}0 for feedback-driven correction (Lu et al., 5 Feb 2026).

This makes DyTopo more interpretable than systems whose communication pattern is fixed or fully dense. The evolving graph provides a stage-specific rationale for information flow: who needed what, who could provide it, and why a link existed in that round. A plausible implication is that DyTopo’s contribution is as much about coordination observability as about absolute accuracy, since the graph itself becomes an analyzable artifact of the reasoning process.

The framework also has explicit limitations. Descriptor quality is critical: if query or key summaries are vague or misleading, semantic matching can misroute communication and propagate error. Performance is sensitive to the threshold Hi(t)H_i^{(t)}1, and the optimal sparsity differs by task. The ideal number of rounds is task-dependent, so a single global communication budget is suboptimal. The routing is not learned end-to-end; it uses a fixed embedding model plus thresholding, which preserves simplicity and interpretability but constrains adaptivity. Finally, graph induction still requires Hi(t)H_i^{(t)}2 pairwise similarity evaluations each round, and the paper notes privacy or logging concerns because coordination traces and messages may be stored (Lu et al., 5 Feb 2026).

Within the broader research landscape, DyTopo is best understood as a dynamic graph-routing framework for LLM coordination rather than a topological-data-analysis method. Earlier graph-topological work such as “The Landscape of Complex Networks” defines critical nodes, basin structure, and saddles on static graphs with scalar node functions (Weinan et al., 2012). DyTopo instead uses topology as an inference-time communication graph that is reconstructed round by round from semantic descriptors. This suggests that its closest methodological lineage lies in adaptive multi-agent orchestration, sparse communication, and content-based routing, even though the term “topology” gives it a broader conceptual resonance.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DyTopo.