AgentDAM: Multi-Domain Agentic Systems
- AgentDAM is a collection of modular, agentic systems using tool-centric orchestration and closed-loop reasoning to automate specialized tasks.
- It applies to privacy benchmarking for web agents, additive manufacturing alloy discovery, and molecular dynamics automation in protein-ligand simulations.
- The frameworks achieve robust performance through explicit tool APIs, iterative error correction, and adaptive optimization strategies across diverse scientific domains.
AgentDAM designates three distinct but thematically related frameworks published between 2024 and 2026, each leveraging LLMs and agentic systems for automation and benchmarking in specialized technical domains. The term spans: (1) an agentic privacy-leakage benchmark for web agents (Zharmagambetov et al., 12 Mar 2025), (2) a multi-agent additive manufacturing alloy discovery system (Pak et al., 2 Oct 2025), and, in historic drafts, (3) a modular agentic molecular dynamics automation toolkit [(Guilbert et al., 10 Dec 2025), under the alternate name DynaMate]. Across these applications, AgentDAM frameworks share foundational design patterns—modular agent hierarchies, tool-centric orchestration, and closed-loop reasoning—but are tailored to distinct scientific and practical goals.
1. AgentDAM for Privacy Leakage in Web Agents
Formal Definition of Data Minimization
AgentDAM introduces a task-driven formalism for data minimization in web navigation agents: an agent should only process or reveal those components of a user’s private input that are strictly necessary to complete a task . Denoting the minimal necessary subset as , any agent output trajectory is non-compliant if it references . Necessity is determined by expert annotation; sensitive data is marked as unnecessary and any action that reveals constitutes a privacy breach (Zharmagambetov et al., 12 Mar 2025).
Benchmark and Simulator Design
The benchmark operates over a partially observable Markov decision process (POMDP), simulating three instrumented web-app environments: GitLab, Shopping, and Reddit. At each timestep, the agent receives:
- A task description and user context (including both relevant and irrelevant/sensitive elements),
- A page observation —either a structured accessibility-tree (AXTREE) or a screenshot annotated with bounding boxes (Set-of-Marks, SOM).
Agents interact via action primitives (click, type, scroll, navigation, stop), and the simulator transitions deterministically. Across 246 human-annotated test cases spanning eight discrete task types, privacy leakage is scored via a judge LLM (gpt-4o), which parses all agent actions for references to (direct or paraphrased) (Zharmagambetov et al., 12 Mar 2025).
Utility and Privacy Metrics
Task utility is binary: , where is the final simulator state. Privacy leakage rate is given by:
with overall privacy performance . An entire trajectory is marked leaking if any action leaks.
Empirical Findings
Key results include:
- Baseline privacy leakage rates for leading LLM agents (unmitigated): gpt-4o (), gpt-4o-mini (), gpt-4-turbo (), llama-3.2/3.3 (), and claude-cua ().
- Privacy-aware chain-of-thought (CoT) prompting increases privacy performance to $0.82$–$0.94$ with a $3$–$6$-point drop in utility. Pre-filtering (via gpt-4o) yields smaller gains (Zharmagambetov et al., 12 Mar 2025).
- Highest leakage rates occur during open-form text generation (e.g., Reddit post creation); shortest tasks (e.g., wishlist addition) rarely leak.
- Cross-model leak correlations are low (–$0.5$) across model families, indicating divergent error modes.
Limitations and Recommendations
The benchmark focuses on three web applications. Sensitive data annotation is binary and grounded in hand-constructed necessity definitions. The approach does not autonomously infer . Future improvements suggested include contextually aware reasoning modules capable of deriving necessity dynamically, richer modalities (APIs, file systems), and robust, dynamic end-to-end safeguards beyond static pre-filtering (Zharmagambetov et al., 12 Mar 2025).
2. AgentDAM for Additive Manufacturing Alloy Discovery
Framework Architecture and Orchestration
AgentDAM (Agentic Additive Manufacturing Alloy Discovery) is a multi-agent system integrating LLMs (Claude Sonnet 4), the Model Context Protocol (MCP), and specialized scientific tools for end-to-end automation of new alloy discovery in laser powder-bed fusion (LPBF). The LLM orchestrator parses user prompts and decomposes objectives into tool calls managed by subagents:
- workspace-agent: manages experiment state and sharing of serialized data.
- thermo-calc agent: interfaces with the Thermo-Calc TC-Python API for equilibrium phase diagram and property extraction.
- additive-manufacturing agent: simulates melt-pool dynamics, generates process maps, and classifies defect regimes.
Coordination is achieved through structured JSON-RPC tool calls (Pak et al., 2 Oct 2025).
Process and Workflow Coordination
The LLM decomposes user instructions (e.g., “find a corrosion-resistant alloy with minimal lack-of-fusion”) into a sequence: workspace selection, property prediction via CALPHAD, process map construction, and detailed result analysis. Each agent exposes explicit tool APIs (name, argument schema, description), returning structured outputs (e.g., tuples of unfavorable laser powers and velocities or phase transition temperatures) that inform subsequent steps.
Tool Integration and Simulation Physics
- Thermo-Calc provides CALPHAD-driven equilibrium diagrams and properties.
- Melt-pool regimes are modeled using analytic solutions (Eagar–Tsai, Rosenthal) and defect criteria:
- Lack-of-fusion criterion:
- Emissivity-based absorptivity:
- Rosenthal solution for temperature field:
Autonomous Reasoning and Adaptivity
After process-map generation, the system evaluates metrics such as the fraction of unfavorable build settings (“lack-of-fusion” points). If undesirable, the LLM adapts—modifying geometric parameters, expanding process windows, or recommending composition changes. This constitutes a closed-loop, adaptive optimization process, with explicit pseudocode presented in the text (Pak et al., 2 Oct 2025).
Experimental Benchmarks
AgentDAM was validated on:
- 12 well-known AM alloys, correctly predicting lack-of-fusion regimes and matching published windows.
- Novel, user-generated compositions, demonstrating robust generalization.
- Automated property searches (e.g., for corrosion resistance) with recommendations confirmed against external literature. Success rates averaged 90% on known alloys and 80% on arbitrary compositions, with turnaround times of seconds to minutes—over a 10× reduction in manual effort (Pak et al., 2 Oct 2025).
Constraints and Prospective Development
Current defect modeling is restricted to conduction-mode lack-of-fusion; advanced CFD solvers for keyholing or balling are noted as needed extensions. Structured output misinterpretations and cross-model generalization are key points for future work. Proposed directions include high-fidelity simulation integration, automated experimental feedback ingestion, and extension to multi-objective optimization tasks (Pak et al., 2 Oct 2025).
3. AgentDAM (DynaMate) in Molecular Dynamics Automation
Core System: Modular Agent Collaboration
Earlier drafts of DynaMate (formerly referenced as AgentDAM) delineate a three-agent system for automating protein–ligand MD:
- Planner Agent: interprets goals, retrieves molecular structures, determines parameters via PaperQA/web, outputs an explicit plan.
- MD Worker Agent: executes the plan iteratively (sense–reflect–act), detects and diagnoses tool failures at each stage, and makes targeted corrections (e.g., atom renaming, parameter regeneration).
- Analyzer Agent: performs trajectory analysis and computes MM/PB(GB)SA binding free energies.
Dynamic Tool Use and Reflexive Reasoning
Agents formally register domain-specific tools with schemas and exert workflow control via JSON communication. MD tasks progress through intermediate states (e.g., PDB cleaning, ligand parameterization, system solvation, energy minimization, equilibration, production). Upon encountering errors, agents use a short chain-of-thought to identify and address the cause, iterating up to a preset cycle cap (Guilbert et al., 10 Dec 2025).
Free Energy Protocol
Binding free energies are calculated via gmx_MMPBSA using the Poisson–Boltzmann model:
with molecular mechanics terms from ff14SB/GAFF2, explicit TIP3P water, and continuum PB/SA.
Evaluation and Performance
DynaMate was evaluated across 12 benchmark protein(-ligand) systems with five LLM backends. Canonical protein–ligand systems saw success; malformed inputs (e.g., misnamed atoms) were resolved in certain models but not all. Tool-call overheads were close to theoretical minima in optimal runs, and predicted correlated well with experimental rankings for BRD4 inhibitors (Guilbert et al., 10 Dec 2025).
Known Limitations
Toolchains do not address systems with multiple ligands or complex cofactors. There is no persistent memory across experiments. Limitations also include reliance on external API stability, and the known approximation errors in MM/PB(GB)SA free energy protocols. The absence of autonomous correction for pH-dependent states or ambiguous protonation is also noted (Guilbert et al., 10 Dec 2025).
4. Comparative Table of AgentDAM Frameworks
| Domain | Primary Agent Types | Central Problem |
|---|---|---|
| Privacy for Web Agents (Zharmagambetov et al., 12 Mar 2025) | Evaluator, Simulator | Data minimization compliance |
| Additive Manufacturing (Pak et al., 2 Oct 2025) | Planner, Analysis, ToolExec | Alloy/process-property optimization |
| Molecular Dynamics (Guilbert et al., 10 Dec 2025) | Planner, Worker, Analyzer | Automated, robust MD workflow execution |
This table synthesizes the high-level characteristics of AgentDAM across reported research (Zharmagambetov et al., 12 Mar 2025, Pak et al., 2 Oct 2025, Guilbert et al., 10 Dec 2025).
5. Significance and Future Research Directions
AgentDAM frameworks collectively advance the state-of-the-art in multidomain agentic systems, illustrating: (1) realistic, end-to-end benchmarks for privacy in web agency; (2) automated, scalable scientific discovery pipelines in materials design and molecular simulation; (3) robust, adaptive agent architectures grounded in explicit tool call protocols and recovery heuristics.
Future work, as advocated by the respective authors, includes (a) expansion of privacy benchmarks to encompass richer modalities and autonomous necessity inference, (b) integration of high-fidelity physics and experimental feedbacks into alloy discovery pipelines, (c) extension of agentic scientific systems to challenging molecular modeling regimes, and (d) benchmarking agent adaptability and cross-domain generalization through richer, community-curated datasets (Zharmagambetov et al., 12 Mar 2025, Pak et al., 2 Oct 2025, Guilbert et al., 10 Dec 2025).