Dichotomy Multi-Expert Agent Inference
- The paper introduces a dichotomy-based multi-expert inference framework that leverages binary decision processes to prune candidate experts and enhance prediction accuracy, as shown by systems like TreeAgent and SurvAgent.
- It employs explicit binary trees and progressive interval refinement to structure expert consultation, thereby promoting decision transparency and operational efficiency.
- The approach challenges all-expert fusion methods by demonstrating that selective, staged consultation through discrete branch choices can yield superior and more robust outcomes.
Dichotomy-Based Multi-Expert Agent Inference, in the literature considered here, denotes multi-expert inference schemes that organize consultation, prediction, or diagnosis through repeated binary or near-binary decisions—such as $0/1$ rule evaluation, progressive interval refinement, public/private evidence partitioning, local-versus-upstream cause tracing, or self-versus-expert information arbitration—rather than indiscriminate all-expert fusion. Native forms include explicit binary decision trees and binary interval-refinement pipelines, while adjacent forms include generalized branching controllers whose search spaces are wider than two but whose control logic is still branch-selective and decomposition-oriented (Chen et al., 30 Jun 2026, Huang et al., 20 Nov 2025, Ornia et al., 9 Oct 2025, Ye et al., 2024).
1. Scope and lineage
An early precursor to selective multi-expert inference appears in a multi-agent object-classification framework with one CenterAgent and class experts . There, an incoming object is converted into a tag collection, the CenterAgent consults only a likely subset of experts using “degree of confidence,” and the selected experts return class-specific assessments that are fused into a final output vector. The architecture is explicitly divide-and-conquer in the sense of candidate pruning and staged consultation, but it is not dichotomous in the strict binary-tree sense, and its routing and scoring equations are under-specified; the paper also lacks concrete benchmark results (0902.2751).
Across later work, the topic splits into three recurring forms. First are explicitly dichotomous executors, where every internal decision is binary. Second are hierarchical interval or branch refiners, where outcome space is recursively narrowed. Third are generalized branching orchestrators, where the control problem is multiway rather than binary but remains structurally close to branch selection. This suggests that “dichotomy-based” in current usage is often best understood as a structural principle—progressive narrowing by discrete branch choices—rather than only as a literal two-expert debate.
| System | Expert unit | Dichotomy mechanism |
|---|---|---|
| "TreeAgent" (Chen et al., 30 Jun 2026) | Expert-defined decision nodes plus VLM node votes | Every non-exit node returns |
| "SurvAgent" (Huang et al., 20 Nov 2025) | Pathology/genomics reports, retrieved cases, expert survival models | Progressive interval refinement with |
| "InfoDelphi" (Li et al., 2 Jul 2026) | Agents with shared public and disjoint private evidence | Public/private evidence dichotomy |
| "AstroVLM" (Han et al., 17 Apr 2026) | Process-specialized imaging agents | Local-cause vs upstream-cause backtracking |
| "TOA" (Ye et al., 2024) | Heterogeneous pretrained LLM experts | General branching over expert and response choices |
2. Formal abstractions for branch-selective expert inference
A general abstraction for adaptive multi-expert inference is given by multi-agent sampling. In that formulation, a new sample is generated by choosing both an expert and an optional context made from previous outputs, via
The orchestration problem is then cast as an MDP with state , action , transition 0, and local reward 1. This formalism is not dichotomous by construction—the model-layer width is 2, not 3—but it supplies a reusable representation of online expert selection, dependency-structure choice, and compute-budgeted exploration (Ye et al., 2024).
A second abstraction treats routing as subset selection rather than sequential generation. In KABB, a task is represented by a concept requirement vector 4, each expert by a capability vector 5, and the objective is to choose an expert subset 6. The central scoring signal is a knowledge distance
7
combining semantic mismatch, dependency complexity, historical effectiveness, and team complementarity, together with a Beta-posterior success model and a knowledge-aware Thompson-style score
8
The method remains subset-based rather than binary, but its two-level structure—concept narrowing followed by expert selection—makes it directly compatible with recursive dichotomy over concept groups or expert pools (Zhang et al., 11 Feb 2025).
A third abstraction is explicitly binary at the information-source level. In Bayesian bandits around experts, the learner must decide whether to update from its own action–outcome pair 9 or from an expert outcome stream 0. The optimal one-step rule is
1
that is, choose the source with the larger expected information gain about the optimal action 2. The same work also models trust in the expert through a latent expert policy 3, so the binary choice is not only self-versus-expert but also, effectively, trust-versus-distrust under posterior uncertainty (Ornia et al., 9 Oct 2025).
3. Explicitly dichotomy-based architectures
"TreeAgent" is the clearest strict implementation. Its Decoupled Declarative Decision (D3) Framework compiles an expert-written rule 4 into an executable tree 5 over a fixed Logic Primitive Inventory. Each node class is represented as
6
with execution type 7, and every non-exit node returns a binary outcome 8. Traversal follows
9
where deterministic nodes evaluate arithmetic predicates and VLM nodes answer localized perceptual yes/no questions. To reduce stochasticity, VLM nodes use 0 samples at temperature 1 with majority vote
2
The framework is explicitly binary, zero-modification across expert-authored decision structures that fit its vocabulary, and empirically achieved 3 Macro-F1 on the 147-tree WREF+SRER test set versus 4 for the LightGBM baseline, with runtime 5 minutes per tree versus 6 minutes for human annotation (Chen et al., 30 Jun 2026).
"SurvAgent" instantiates dichotomy in an ordered-outcome setting. Its second stage, Dichotomy-Based Multi-Expert Agent Inference, takes a summarized WSI report 7, a summarized gene report 8, retrieved similar cases
9
and predictions from 0 external survival experts
1
The inference agent then performs progressive interval refinement: 2 with 3, followed by exact survival-time prediction
4
The implementation uses 5 dichotomy levels and four practical strata: 6, 7, 8, and 9 months. In ablation, the “Inference” setting raised overall C-index from 0 to 1, and the full system reached 2 across five TCGA cohorts (Huang et al., 20 Nov 2025).
These two systems operationalize dichotomy in different ways. TreeAgent uses binary procedural predicates over a fixed expert rulebook. SurvAgent uses binary refinement over an ordered prognostic space. Both expose intermediate decisions as inspectable structure rather than collapsing inference into a single opaque score.
4. Generalized branching beyond strict binary trees
"Tree Search-based Orchestrated Agents" (TOA) shows the closest non-binary analogue. Its search tree alternates between model nodes and response nodes, actions are 3, and MCTS proceeds through selection, expansion, simulation, and backpropagation with UCB-based child choice. A key heuristic is asymmetric pruning: at model nodes, TOA prunes child response nodes and retains only those with the highest reward scores, while at response nodes it does not prune model children—“prune hypotheses, not experts.” The method is explicitly not dichotomous: expert-choice width is 4, response branching can exceed 5, and there is no recursive bisection of the task. Yet it is a generalized branching controller whose components—online expert selection, refinement-branch choice, depth-versus-breadth tradeoff, and reward-shaped search—transfer directly to dichotomy-oriented designs. Empirically, TOA reached 6 length-controlled win rate on AlpacaEval 2.0 with five large models and achieved the best average KIWI-XXL score, 7, on WMT’21/’22 while using 8 total FLOPs over the evaluation set; the same paper also reports reward hacking, where internal reward can keep increasing while external evaluation declines (Ye et al., 2024).
"InfoDelphi" moves the dichotomy from action space to evidence space. It partitions the corpus as
9
so each agent sees 0: a shared public subset plus a disjoint private subset. Under its error model,
1
inter-agent error correlation becomes
2
so decreasing 3 decorrelates errors, while too little public evidence harms communicability. With 4, 5, and 6, InfoDelphi achieved Brier score 7 and accuracy 8 on PolyGym; setting 9 or removing rationale sharing removed most of the gain. The dichotomy here is public-versus-private knowledge, not binary expert voting (Li et al., 2 Jul 2026).
"AstroVLM" uses a process-specialized architecture with Agent-Specific Knowledge RAG and Reasoning with Backtracking. ASK-RAG makes a binary choice between partitioning and aggregating subgraphs through a correlation factor 0 compared with threshold 1. RwB builds a Collaborative Reasoning Tree 2 whose nodes carry 3 and whose edges carry causal weights 4. Backtracking proceeds to relevant previous agents only when 5 and 6; a current node is treated as a likely direct cause when 7 and 8. This is not a strict binary decision tree, but it repeatedly poses a dichotomy-like question: is the cause local to the current process, or upstream in a prior process? AstroVLM reported average score 9 versus 0 for the best non-AstroVLM baseline, with ASK-RAG and RwB both strongly supported by ablation (Han et al., 17 Apr 2026).
At lower granularity, "MeltRTL" shows that expert partitioning can also occur inside a shared model rather than among explicit agents. Its design partitions RTL tasks into combinational, sequential/datapath, and FSM/controller modules, selects correctness-critical attention heads with probe ensembles, and uses binary head indicators 1 in
2
This is a partial match to the topic: it is partition-based and selectively gated, but not an agent-level dichotomy (Mashnoor et al., 19 Jan 2026).
5. Expert choice, coherent aggregation, and influencer formation
The most rigorous asymptotic expert-selection theory in the set is the multihypothesis social-learning result. There, agent 3 chooses one expert 4, then combines that expert’s decision 5 with its own private observations to produce 6. The expert is not chosen by standalone accuracy; it is chosen by the induced loss exponent of agent 7. The final criterion is
8
with 9 defined through worst-case pairwise hypothesis discrimination. The same paper proves that, up to asymptotic equivalence, the worst canonical loss exponent for the main agent is achieved under 0-1 loss; introduces hypothesis-loss neutrality; shows that, under neutrality, experts with smaller decision spaces 0 are asymptotically ignored; and shows that an expert with the same loss function as the principal is not necessarily optimal (Tay, 2014).
A different foundation concerns coherent aggregation once experts have already been selected. In an integrating decision support system with panels 1, if the conditional expected utility is algebraic in panel-specific quantities,
2
and the required factorization conditions hold, then each panel need transmit only the moments of the local algebraic features 3, not its full model. Adequacy is guaranteed under score separability or quasi independence. For dichotomy-based multi-expert systems, this gives a precise aggregation layer: route or prune experts first, then combine only the summaries needed by the downstream score (Leonelli et al., 2017).
Influence formation in deliberative systems can itself be analyzed as expert routing. Under Friedkin–Johnsen dynamics,
4
the equilibrium belief is a convex combination of innate beliefs, and when 5 depend on the input, the system becomes a mixture of experts
6
The paper’s empirical result is that true competence is latent; observable proxies such as self-assessed confidence, relative confidence, perceived confidence, initial alignment, and especially stubbornness govern who becomes influential. This is important for dichotomy-based inference because branch choice can be competence-driven or merely confidence-driven, and those are not equivalent (Bause et al., 25 May 2026).
6. Empirical pattern, misconceptions, and open directions
Across domains, structured expert differentiation repeatedly outperforms homogeneous or all-expert baselines. Beyond TreeAgent and SurvAgent, KABB reached 7 LC win rate on AlpacaEval 2.0 versus 8 for MoA while selecting only two experts in that setting, and the paper states that KABB can achieve similar LC win rate to MoA at roughly 9 of the cost; MeltRTL improved VerilogEval from 00 to 01 synthesizability and from 02 to 03 functional correctness with 04 computational overhead; AstroVLM, as noted above, improved average diagnostic score to 05; and InfoDelphi showed that removing information asymmetry eliminates most deliberation gains (Zhang et al., 11 Feb 2025, Mashnoor et al., 19 Jan 2026, Han et al., 17 Apr 2026, Li et al., 2 Jul 2026).
Several common misconceptions are contradicted by the literature. First, “dichotomy-based” is not synonymous with a two-agent debate. It can mean binary rule nodes, progressive interval refinement, public/private evidence decomposition, or local-versus-upstream causal tracing. Second, more experts are not automatically better. TOA reports that, on MATH, combining all four small models does not outperform combining the top two; InfoDelphi finds 06 worse than 07; ForecastAgentSearch argues for a compact top-ranked expert set rather than all-expert consultation; and the trading framework "Toward Expert Investment Teams" shows that removing several agents can improve Sharpe, indicating that unaligned experts can add noise rather than signal (Ye et al., 2024, Cai et al., 30 Jun 2026, Miyazaki et al., 26 Feb 2026). Third, reward, confidence, or judge scores are not identical to competence. TOA reports reward hacking; the FJ-based analysis shows influence is strongly driven by confidence and stubbornness; the Bayesian source-selection work warns that compromised experts can induce confidently wrong posteriors; and MeltRTL reports that too large an intervention strength 08 destabilizes generation (Ye et al., 2024, Bause et al., 25 May 2026, Ornia et al., 9 Oct 2025, Mashnoor et al., 19 Jan 2026).
A further distinction separates inference from diagnostic governance. The diagnostics framework for recruiter-assistant systems builds gold versus silver datasets, judges extraction as TP/FN/FP, scores behavioral alignment, and embeds prescriptions into a reusable recommendation map. That machinery is strongly dichotomous in evaluation space—correct versus incorrect extraction, aligned versus drifted behavior—but it is not itself a multi-expert inference algorithm; it is a refinement layer around one (Sorstkins et al., 18 Sep 2025).
The main open direction is to combine the strongest pieces of these lines. This suggests binary or tournament-style expert routing over retrieval-ranked pools, branch-local information-gain selection, and algebraically coherent summary passing after expert pruning, rather than treating dichotomy, expert search, and aggregation as separate problems (Zhang et al., 11 Feb 2025, Ornia et al., 9 Oct 2025, Leonelli et al., 2017). A plausible implication is that future systems will make binary decisions at multiple levels simultaneously: which evidence partition to consult, which expert branch to descend, whether a disagreement is informative or spurious, and whether a local explanation should be accepted or recursively refined.