Papers
Topics
Authors
Recent
Search
2000 character limit reached

AgentCDM: Multi-Agent Decision Framework

Updated 22 December 2025
  • AgentCDM is a multi-agent decision-making framework characterized by structured workflows, ACH-inspired hypothesis evaluation, and integration of language models with deterministic logic.
  • It employs a two-stage training approach combining explicit analytical scaffolding with autonomous generalization, achieving significant accuracy improvements across benchmarks.
  • The framework mitigates bias and enhances interpretability through systematic hypothesis enumeration, evidence-based reasoning, and meta-cognitive review, with applications in clinical support and CAD design.

AgentCDM comprises a family of @@@@1@@@@ and system architectures that enhance collaborative decision-making across domains including general problem-solving, conceptual CAD design, and clinical decision support. The AgentCDM paradigm is characterized by structured agent workflows in which candidate hypotheses or actions are generated, systematically evaluated, and selected based on a combination of formal reasoning, knowledge retrieval, and often iterative feedback and verification. The core advances of AgentCDM systems include: (1) moving beyond traditional voting or dictatorial agent selection for multi-agent systems, (2) incorporating structured analytic or imperative paradigms that mitigate cognitive or procedural errors, and (3) achieving high efficiency and interpretability by integrating LLMs, symbolic processing, and deterministic logic pipelines (Zhao et al., 16 Aug 2025, 2505.23055, Ni et al., 1 Aug 2025).

1. Multi-Agent Collaborative Decision-Making: The ACH-Inspired AgentCDM Framework

The flagship AgentCDM architecture addresses limitations of classical collaborative decision-making (CDM)—namely, the vulnerability of dictatorial aggregation to cognitive bias and the lack of evidence synthesis in voting-based methods. AgentCDM draws directly on the Analysis of Competing Hypotheses (ACH) scaffold, originating in cognitive science, to structure LLM-driven multi-agent reasoning (Zhao et al., 16 Aug 2025). Its key process flow involves:

  1. Execution Phase: Each of nn execution agents πi\pi_i receives the user query ss and independently outputs candidate answers ai∼πi(⋅∣s)a_i \sim \pi_i(\cdot \mid s).
  2. Decision Phase: A dedicated decision agent πD\pi_D receives the aggregated history H={s,a1…an}H = \{s, a_1 \ldots a_n\} and initiates structured reasoning to produce the final answer aD∼πD(⋅∣H)a_D \sim \pi_D(\cdot \mid H).

The reasoning follows explicit ACH steps:

  • Hypothesis Formulation: Extract and unify assertions from all candidate answers into a mutually exclusive set H={h1,…,hm}H = \{h_1, \ldots, h_m\}.
  • Evidence Collection: Aggregate all facts and arguments E={e1,…,ek}E = \{e_1, \ldots, e_k\} pertinent to the hypotheses.
  • Hypothesis–Evidence Matrix: Construct mjℓ∈{+1,0,−1}m_{j\ell} \in \{+1, 0, -1\} for each (hjh_j, eâ„“e_\ell) pair, designating evidence as consistent, irrelevant, or inconsistent with each hypothesis.
  • Meta-Cognitive Review: Require review of bias and adversarial tests on the leading hypothesis, prompting the agent to expose hidden assumptions and reevaluate evidence matrices.

A disconfirmation-focused score for each hypothesis is computed as:

score(hj)=∑ℓ=1kI[mjℓ=−1]−α∑ℓ=1kI[mjℓ=+1],α∈[0,1].\text{score}(h_j) = \sum_{\ell=1}^k \mathbb{I}[m_{j\ell} = -1] - \alpha \sum_{\ell=1}^k \mathbb{I}[m_{j\ell} = +1],\quad \alpha \in [0,1].

The conclusion is selected by minimizing this score, with an analytic report documenting the logic within structured tags.

2. AgentCDM Training: Two-Stage Scaffolding and Generalization Paradigm

AgentCDM leverages a two-stage training approach (Zhao et al., 16 Aug 2025):

  • Stage 1—Explicit ACH Scaffolding: Training employs detailed prompts and rewards for adherence to ACH protocol, format correctness, and accuracy.
  • Stage 2—Scaffolding Removal and Curriculum Annealing: Structural ACH guidance is gradually removed (via cosine-annealed prompt sampling), and rewards are based on semantic similarity to the original template, fostering autonomous generalization.

Key hyperparameters include T≈104T \approx 10^4–10510^5 RL updates, batch size P=256P = 256, N=5N = 5 rollouts, GRPO with Adam (lr≈10−5\mathrm{lr} \approx 10^{-5}), sampling temperature $0.6$, and top-pp of $0.95$.

3. Empirical Demonstration Across Benchmarks

AgentCDM's efficacy is validated on MMLU (broad-knowledge MCQA), MMLU-PRO (10-way MCQA), and ARC-Challenge (multi-step science QA) (Zhao et al., 16 Aug 2025). Across all benchmarks and model backbones:

  • All-average accuracy lift over Single-Agent approaches: +11.6+11.6 percentage points (from 65.9%65.9\% to 77.5%77.5\%).
  • Substantial gains on challenging tasks: e.g., +17.3+17.3 pp on MMLU-PRO.
  • Superior cross-dataset generalization: Training on MMLU-PRO yields 94.0%94.0\% on ARC-Challenge compared to in-domain specialists.
  • Robustness and scalability: Benefits increase with agent quality and heterogeneity; however, weak agents introduce noise amplification.
  • Ablation confirms necessity of both training stages: Single-stage approaches yield inferior generalization or undertrained decision agents.

4. Generalizations to Domain-Specific Architectures

AgentCDM's strategy underlies several verticalized systems:

a. Clinical Decision Support: CDR-Agent

For clinical environments, CDR-Agent operationalizes AgentCDM as a modular end-to-end LLM-driven suite that replaces monolithic LLM calls with a structured pipeline (2505.23055):

  • Note-Parsing and Embedding: Clinical notes and CDRs embedded in a shared vector space.
  • Rule Retrieval (Anomaly Detection): Gaussian-based statistical filtering of embedding similarities selects only statistically anomalous, relevant CDRs.
  • Reasoning Engine and Variable Extraction: Prompted variable extraction for each candidate rule, defaulting to negative imputation on missing data (cautious imaging paradigm).
  • Deterministic Rule Execution: Extracted clinical variables are mapped to decision logic implemented as deterministic code.

Evaluation demonstrates 56.3%56.3\% and 8.7%8.7\% absolute accuracy gains over standalone LLMs for CDR selection in simulated and real datasets respectively, as well as significant reductions in total decision latency. Incorporation of synonym expansion and truncation-robust embeddings further bolster generalizability and extensibility.

b. Automated CAD Conceptual Design: CADDesigner

The AgentCDM architecture, referred to as CADDesigner in this context, applies LLM agency to end-to-end conceptual design (Ni et al., 1 Aug 2025):

  • ReAct-Style Agent Loop with modules for input processing, requirement analysis, code generation (using the Context-Independent Imperative Paradigm, CIP), geometric execution and rendering, structured visual feedback, and knowledge base integration.
  • CIP: Enforces stateless, function-calling code steps, with explicit type and error annotations, enabling robust error correction and compatibility across CAD systems.
  • Visual Feedback Loop: Binary and structured feedback guides iterative improvement; final models, scripts, and cases populate a continuously expanding retrieval-augmented knowledge base.

This yields state-of-the-art results on standard CAD benchmarks, with CADDesigner attaining 100%100\% success rate and best-in-class IoU, Chamfer Distance, and Hausdorff Distance metrics. Ablations demonstrate performance degradation without CIP semantic structure.

5. Mitigation of Bias, Scalability, and Interpretability

The AgentCDM methodology systematizes bias mitigation at architectural and protocol levels (Zhao et al., 16 Aug 2025):

  • Hypothesis enumeration precludes early anchoring.
  • Evidence matrices with focus on falsification counter confirmation bias.
  • Meta-review and adversarial steps expose hidden or implicit assumptions.
  • Structured reports and code outputs ensure process transparency and interpretability.

Scalability is positively correlated with agent diversity and strength, but the presence of weak or adversarial agents can degrade system robustness, suggesting a need for trust modeling and outlier detection in future extensions.

6. Limitations, Extensions, and Broader Impact

AgentCDM systems are dependent on the quality and diversity of candidate hypotheses or rule candidates generated by their constituent agents (Zhao et al., 16 Aug 2025, 2505.23055). Present designs assume cooperative agent pools and do not explicitly address adversarial or malicious inputs. Extension directions include adaptive scaffolding, population-statistical or interactive imputation of missing variables, mixture-model anomaly detection, and explicit trust and outlier modeling. Applications are foreseen in high-stakes areas such as medical diagnosis, intelligence analysis, and real-time industrial decision support. Broader impacts potentially include reduction of human bias in collaborative reasoning and new ethical considerations in delegating collective decision-making to autonomous agent frameworks.

7. Summary Table: AgentCDM Instantiations

Domain Core AgentCDM Modules Key Innovations
General CDM (Zhao et al., 16 Aug 2025) ACH-inspired hypothesis/evidence loop Two-stage protocol, bias mitigation
Clinical Decision (2505.23055) Embedding rule retrieval, prompted extraction Anomaly-filtering, cautious imputation
CAD Conceptual Design (Ni et al., 1 Aug 2025) Requirement analysis, CIP code gen, visual feedback Stateless imperative code, error correction

The AgentCDM paradigm establishes a formal, structured, and empirically validated approach for collaborative decision-making in multi-agent LLM settings, with demonstrated advantages in accuracy, generalization, interpretability, and bias mitigation.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AgentCDM.