Dynamic Forest of Agents (FoA)

Updated 4 July 2026

Dynamic Forest of Agents (FoA) is a design pattern in multi-agent systems characterized by dynamic branching, localized decision-making, and structured evidence aggregation.
It integrates varied coordination mechanisms such as judge contexts, task-decomposition trees, and stochastic decision forests to iteratively refine outputs.
FoA implementations enhance efficiency and scalability by balancing independent agent responses with calibrated inter-agent communication and aggregation.

Dynamic Forest of Agents (FoA) denotes a family of dynamic multi-agent execution schemes in which tree- or forest-structured coordination is materialized online rather than fixed in advance. In the surveyed literature, the construct appears in several non-equivalent but related forms: overlapping judge-context trees over a cohort in JAF, source-rooted task trees in feedback-driven binary analysis, task-DAG forests with ephemeral collaboration clusters in semantics-aware orchestration, spawn-induced parent–child structures in fluid-agent reinforcement learning, trajectory populations in LLM search, and calibrated canopies over independent agent outputs in controlled-communication ensembles (Garg et al., 29 Jan 2026, Zhang et al., 16 Apr 2026, Giusti et al., 24 Sep 2025, Sharma et al., 16 Feb 2026, Klein et al., 2024, Li et al., 24 May 2026). A more formal antecedent is the stochastic decision forest, where a single lottery draw selects a tree and agents act under exogenous information structures rather than a dynamic “nature” player (Rapsch, 2024). The term therefore does not name one canonical architecture; it names a recurrent design pattern centered on dynamic branching, localized reasoning, and structured aggregation.

1. Core definitions and representational forms

In JAF, Dynamic Forest of Agents refers to an agentic system that “evaluates and refines a cohort of related inputs through multiple, overlapping, relation-aware contexts and repeated, randomized passes.” A single primary agent produces reasoned responses $R_i$ for queries $Q_i$ , and a judge agent evaluates each focal pair jointly with a small set of related peer pairs. Each judge invocation is a small tree rooted at a focal instance, and repeated randomized passes induce a forest of overlapping judge contexts (Garg et al., 29 Jan 2026).

In FORGE, the notion is given a direct execution semantics. The forest is written as

$FoA = \{\mathcal{T}_1, \mathcal{T}_2, \ldots, \mathcal{T}_n\},$

where each tree is rooted at a specific source, and each node is an agent $A_i(T_i)$ executing a localized task $T_i$ . The forest is only partially materialized at runtime, expanding where evidence and tool observations justify further analysis (Zhang et al., 16 Apr 2026).

A different formalization appears in stochastic decision forests. There, an sdf on an exogenous scenario space $(\Omega,E)$ is a triple $(F,\pi,X)$ , with fibers of $\pi$ giving the connected components $T_\omega$ . A single lottery draw selects the relevant tree, and agents receive exogenous information structures at random moves. This formulation is explicitly used to connect refined partitions, filtrations, and time-indexed action paths (Rapsch, 2024).

The surveyed instantiations can be summarized as follows.

Instantiation	Structural object	Dynamic mechanism
JAF (Garg et al., 29 Jan 2026)	Overlapping judge-context trees over a cohort	Randomized neighborhoods, iterative self-refinement
FORGE (Zhang et al., 16 Apr 2026)	Source-rooted task trees	Delegation, tool feedback, semantic pruning
Fleet of Agents (Klein et al., 2024)	Population of search trajectories	Mutation–selection resampling
Fluid-Agent RL (Sharma et al., 16 Feb 2026)	Spawn-induced parent–child forest	Time-varying alive set and spawn actions
Federation of Agents (Giusti et al., 24 Sep 2025)	Task-DAG forest plus collaboration clusters	Semantic routing, DAG merge, $k$ -round refinement
DarkForest (Li et al., 24 May 2026)	Independent candidate branches with coordinator canopy	Calibrated belief aggregation, controlled disclosure

This variety suggests that “forest” is best understood as a structural and operational abstraction: multiple agent-local branches evolve concurrently, interact through constrained interfaces, and are aggregated by a higher-level mechanism rather than by unrestricted global dialogue.

2. Dynamic structure, partial materialization, and execution semantics

The most explicit structural account is given by JAF. With a cohort

$Q_i$ 0

the judge call is

$Q_i$ 1

where $Q_i$ 2 and $Q_i$ 3 is a natural-language critique. Each pass forms a tree rooted at instance $Q_i$ 4 with branches to peers in $Q_i$ 5. Overlapping neighborhoods induce a cohort-level knowledge graph

$Q_i$ 6

with edges whenever two instances co-occur in a judge prompt. This graph is sparse and dynamic, since edges accumulate as neighborhoods are re-sampled across iterations (Garg et al., 29 Jan 2026).

FORGE adopts a stricter parent–child hierarchy. A task is represented as

$Q_i$ 7

where $Q_i$ 8 is the current function, $Q_i$ 9 the taint entry, $FoA = \{\mathcal{T}_1, \mathcal{T}_2, \ldots, \mathcal{T}_n\},$ 0 the taint source, and $FoA = \{\mathcal{T}_1, \mathcal{T}_2, \ldots, \mathcal{T}_n\},$ 1 the objective. Recursive decomposition is written as

$FoA = \{\mathcal{T}_1, \mathcal{T}_2, \ldots, \mathcal{T}_n\},$ 2

and delegation instantiates child agents through

$FoA = \{\mathcal{T}_1, \mathcal{T}_2, \ldots, \mathcal{T}_n\},$ 3

Execution has two coupled directions: forward materialization, in which agents expand the explored structure, and backward evidence propagation, in which child results are aggregated upward into complete evidence chains (Zhang et al., 16 Apr 2026).

In semantics-aware orchestration, each incoming task spawns a tree whose nodes are subtasks, while multiple tasks form a forest. Compatible agents propose decomposition DAGs; consensus-based merging yields a single executable DAG; and small clusters attached to subtask nodes perform $FoA = \{\mathcal{T}_1, \mathcal{T}_2, \ldots, \mathcal{T}_n\},$ 4-round refinement before synthesis. The forest is dynamic because DAGs are continuously grown, pruned, and merged as capability advertisements, budgets, and partial results change (Giusti et al., 24 Sep 2025).

Fluid-agent reinforcement learning defines a different dynamic semantics. In the FoA view, the live agent set at time $FoA = \{\mathcal{T}_1, \mathcal{T}_2, \ldots, \mathcal{T}_n\},$ 5 is $FoA = \{\mathcal{T}_1, \mathcal{T}_2, \ldots, \mathcal{T}_n\},$ 6, and a directed edge $FoA = \{\mathcal{T}_1, \mathcal{T}_2, \ldots, \mathcal{T}_n\},$ 7 is created when agent $FoA = \{\mathcal{T}_1, \mathcal{T}_2, \ldots, \mathcal{T}_n\},$ 8 spawns child $FoA = \{\mathcal{T}_1, \mathcal{T}_2, \ldots, \mathcal{T}_n\},$ 9. Initial agents are roots, and spawn actions induce a forest over the alive set. This is an endogenous structural process: the agent population is neither fixed nor known a priori (Sharma et al., 16 Feb 2026).

The stochastic decision-forest formalism supplies a broader decision-theoretic interpretation. A single lottery draw selects the connected component $A_i(T_i)$ 0, while exogenous information structures $A_i(T_i)$ 1 govern what agents know at random moves. This replaces the classical “nature as an agent” construction and supports continuous-time stochastic processes, including Brownian-motion-driven information flows (Rapsch, 2024).

3. Coordination, inference, and aggregation

Dynamic FoA systems differ most sharply in how branches exchange information and how branch-level evidence is fused.

JAF is judge-centric and cohort-aware. The judge never evaluates a focal instance in complete isolation; it compares the focal response against peer exemplars, detects inconsistencies, missing evidence, and policy misalignments, and returns a label–critique pair. Iterative self-refinement freezes instances only after repeated accepted rounds, while post hoc robustness is estimated through repeated randomized judge evaluations:

$A_i(T_i)$ 2

The paper describes the overall process as a language-mediated analogue of belief or label propagation, but explicitly notes that it does not define closed-form message updates such as $A_i(T_i)$ 3 or node beliefs $A_i(T_i)$ 4; propagation is realized algorithmically through re-sampling neighborhoods and re-invoking the judge with updated responses (Garg et al., 29 Jan 2026).

Fleet of Agents treats coordination as population-based search. A fleet of $A_i(T_i)$ 5 agents explores autonomously for $A_i(T_i)$ 6 steps, then a heuristic value function scores current states and the population is resampled with replacement. With exponential weighting,

$A_i(T_i)$ 7

and new particles are sampled i.i.d. from $A_i(T_i)$ 8. This makes the effective branching factor emergent rather than fixed: concentrated value mass yields cloning and focused search, whereas diffuse values preserve diversity (Klein et al., 2024).

DarkForest takes the opposite approach to inter-agent interaction. Initial agents remain independent and do not see one another’s outputs. Their responses are parsed into records

$A_i(T_i)$ 9

clustered by canonical candidate $T_i$ 0, and scored by a calibrated belief function

$T_i$ 1

with posterior-like normalization

$T_i$ 2

Only policy-permitted evidence from this belief state is disclosed to a coordinator, and a deterministic guardrail can override the coordinator if the top-belief candidate is strongly supported. The framework is therefore aggregation-first rather than dialogue-first (Li et al., 24 May 2026).

Fluid-agent reinforcement learning provides game-theoretic solution concepts for dynamic forests generated by spawn actions. The underlying Partially Observable Fluid Stochastic Game allows a time-varying alive set, spawn-induced action-set changes, and endogenous population resizing. The paper proves that every POFSG possesses a stationary mixed-strategy Nash equilibrium, and that every finite-horizon POFSG with publicly observed joint actions and perfect recall possesses a subgame-perfect Nash equilibrium (Sharma et al., 16 Feb 2026).

4. Locality, retrieval, and communication boundaries

A central design question in Dynamic FoA is how each agent acquires local context without collapsing into global, unstructured communication.

JAF addresses locality through relation-aware neighborhood construction. Its learned locality-sensitive hashing builds features from both query and response,

$T_i$ 3

and defines binary hash functions $T_i$ 4 with code

$T_i$ 5

The objective is to maximize informativeness and label consistency through $T_i$ 6, while LLM-guided predicates and side information supply semantic structure that embedding-space $T_i$ 7NN alone misses. In the empirical vanilla instantiation, the study used 8 exemplars per neighborhood, specifically 4 “positive” and 4 “negative” peers selected by overlap in claimed software components (Garg et al., 29 Jan 2026).

Federation of Agents generalizes locality to large-scale orchestration. Agents advertise Versioned Capability Vectors, capability embeddings are indexed in sharded HNSW structures, and routing minimizes a cost-biased objective

$T_i$ 8

With policy gating at the shard level and HNSW retrieval inside shards, the reported routing complexity is

$T_i$ 9

sub-linear in total agent count $(\Omega,E)$ 0. The same framework bounds collaboration overhead by capping cluster sizes and using MQTT pub/sub topics for decomposition proposals, routing decisions, and refinement exchanges (Giusti et al., 24 Sep 2025).

DarkForest imposes the strictest communication boundary. The disclosure operator $(\Omega,E)$ 1 exposes compact calibrated evidence rather than raw reasoning traces by default. On GPQA, the reported disclosure comparison was 40.00% accuracy with a belief summary at 3435.5 tokens/sample, 40.00% with a reasoning summary at 4136.0 tokens, and 36.67% with full raw traces at 5004.5 tokens. The result is used to argue that unrestricted trace sharing can contaminate downstream reasoning while increasing token cost (Li et al., 24 May 2026).

A distinct but relevant locality principle appears in distributed spanning-forest maintenance. Barjon, Casteigts, Chaumette, Johnen, and Neggaz study a synchronous message-passing algorithm in which every decision is purely local, with no wave mechanisms or global knowledge. By the end of every round, the network is guaranteed to be covered by a forest of spanning trees with no cycles, exactly one tree membership per node, and exactly one root or token per tree, despite arbitrary link churn. This is not an LLM-agent framework, but it gives a precise graph-theoretic notion of dynamic forest maintenance under purely local rules (Barjon et al., 2014).

5. Empirical instantiations and reported performance

JAF was evaluated on cloud misconfiguration triage in large-scale cloud environments. The dataset contained 315 assets, each with 1–54 misconfigs. A single reasoning model, Llama3.3-Nemotron-Super-49B-v1.5, served as both primary agent and judge under different prompts. Evaluation used a probabilistic correctness estimate with $(\Omega,E)$ 2 randomized neighborhood trials per asset–method. After 5 iterations, the reported distribution of $(\Omega,E)$ 3 shifted upward under JAF relative to the isolated-judge baseline, with increased mean, reduced variance, and more assets near 1.0; after 10 iterations, JAF appeared only modestly sharper than at 5 iterations, suggesting faster convergence and stability. Precision, recall, and $(\Omega,E)$ 4 were not reported; the study used cohort-aware probabilistic correctness as the primary metric (Garg et al., 29 Jan 2026).

FORGE provided the most explicit large-scale Dynamic FoA deployment. On 3,457 real-world firmware binaries, it identified 1,274 vulnerabilities across 591 unique binaries, with 72.3% precision. Average resource usage was 33.8 agents and 464 reasoning steps per binary, with 1.61M tokens per binary and 43.8 minutes per binary. The reported cost per verified vulnerability was 140.2 minutes and 4.71M tokens for FoA, versus 357.1 minutes and 12.5M tokens for a single-agent baseline, which the paper summarized as an approximately $(\Omega,E)$ 5 efficiency improvement. In the 500-binary ablation, Full FoA yielded 172.0 underlying and 136.0 verified findings, compared with 45.8 and 22.3 for sequential-only generation, and 8.4 and 1.3 for a single agent (Zhang et al., 16 Apr 2026).

Fleet of Agents evaluated dynamic tree search on Game of 24 and Mini Crosswords. On Game of 24 with GPT-3.5-turbo, the balanced FoA configuration reported success 0.251 at cost \$(\Omega,E)$61.711. On Mini Crosswords with GPT-4, FoA reported 0.460 letter overlap at cost \$(\Omega,E)$748.988. The paper characterized the advantage as arising from fewer calls to the expensive value function and from dynamic branching via resampling rather than fixed branching-factor expansion (Klein et al., 2024).

DarkForest reported results on six reasoning benchmarks. It achieved 76.80 exact match on MATH, 84.00 Pass@1 on HumanEval, 58.38 accuracy on MMLU-Pro, 39.90 accuracy on GPQA, 15.67 execution accuracy and 11.33 program accuracy on FinQA, and 68.00 exact match on a LegalBench subset. The paper states that it improved the strongest baseline by up to 30.7% on benchmark metrics and reduced token consumption by up to $(\Omega,E)$8 relative to communication-heavy baselines (Li et al., 24 May 2026).

Federation of Agents evaluated task-DAG forests and clustered refinement on HealthBench Hard. With sharded HNSW over 256-D VCVs, up to four proposers, 2–4 subtasks, and $(\Omega,E)$9 rounds, the reported overall score was 0.13, described as a $(F,\pi,X)$0 relative improvement over the best single-model baseline and $(F,\pi,X)$1 over the uncoordinated ensemble. Disabling clustering degraded performance on complex, context-heavy items, while removing cost-biased routing increased spend and tail latency with negligible quality gains (Giusti et al., 24 Sep 2025).

Fluid-agent reinforcement learning evaluated fluid variants of Predator–Prey, Level-Based Foraging, and PuddleBridge. In Predator–Prey, a fluid team trained with VDN adapted population size to resource level and matched or approached the best fixed team across prey densities. In Level-Based Foraging, PPO, MAPPO, and VDN learned the optimal composition of one additional level-2 spawned agent. In PuddleBridge, VDN agents learned to switch between a non-fluid single-agent policy and a fluid two-agent bridging policy depending on whether a gate was open or closed (Sharma et al., 16 Feb 2026).

6. Misconceptions, limitations, and open problems

A common misconception is that Dynamic FoA is synonymous with unrestricted multi-agent discussion. The surveyed systems contradict that view. DarkForest explicitly argues that exposing raw responses or reasoning traces can amplify incorrect intermediate reasoning and increase token overhead, whereas belief-state disclosure preserves independence and improves the token–quality trade-off (Li et al., 24 May 2026). JAF likewise does not rely on unconstrained peer-to-peer deliberation; it uses small, relation-aware judge contexts and ensemble-style aggregation (Garg et al., 29 Jan 2026).

A second misconception is that “forest” implies a static tree topology. In JAF, neighborhoods are re-sampled across passes and the induced $(F,\pi,X)$ 2 gains edges over time; in Federation, the orchestrator grows, prunes, and merges task-specific DAGs while forming ephemeral collaboration clusters; in fluid-agent RL, the alive set itself changes because agents can spawn new agents during execution (Garg et al., 29 Jan 2026, Giusti et al., 24 Sep 2025, Sharma et al., 16 Feb 2026). The forest metaphor therefore describes an evolving execution substrate, not a fixed compile-time plan.

The surveyed systems also impose explicit caveats on their own uncertainty measures. JAF states that ensemble correctness $(F,\pi,X)$ 3 reflects stability under randomized neighborhoods and is not a calibrated probability of factual correctness; it should be used as a triage consistency signal rather than as ground truth (Garg et al., 29 Jan 2026). Similar caution applies to DarkForest’s posterior-like belief distribution, which is calibrated empirically rather than derived from a full generative likelihood model (Li et al., 24 May 2026).

Several limitations recur across instantiations. FORGE depends on tool fidelity, especially radare2 and grighra; disassembly or CFG errors can propagate into reasoning, and obfuscation or path explosion still bound coverage despite semantic pruning (Zhang et al., 16 Apr 2026). Federation is sensitive to embedding quality, schema drift, cost calibration errors, and adversarial capability misrepresentation (Giusti et al., 24 Sep 2025). Fluid-agent RL currently studies spawn-only fluidity and does not model agent death or merge/split operations (Sharma et al., 16 Feb 2026). JAF notes that poorly chosen hash predicates can produce redundant or uninformative bits, and that small or noisy neighborhoods can mislead the judge (Garg et al., 29 Jan 2026).

Open problems are correspondingly diverse. JAF highlights systematic study of LSH variants, RL-guided test-time compute allocation over hash buckets and CoT styles, and occasional SFT using refined outputs while preserving modularity and privacy (Garg et al., 29 Jan 2026). Federation emphasizes online learning of routing weights, dynamic shard rebalancing, Byzantine-resilient aggregation, and human-in-the-loop DAG nodes (Giusti et al., 24 Sep 2025). Fleet of Agents points to stronger value functions, alternative resampling policies, and hierarchical fleets in which agents spawn sub-fleets (Klein et al., 2024). DarkForest identifies multi-modal parsing, tool-use settings, and dynamic addition or removal of heterogeneous agents as natural extensions (Li et al., 24 May 2026). Taken together, these directions suggest that Dynamic FoA is evolving from a useful systems metaphor into a technically differentiated class of architectures for long-horizon reasoning, adaptive decomposition, and uncertainty-aware aggregation.