CatMaster: Autonomous Catalysis Workflow Manager
- CatMaster is an autonomous system for computational heterogeneous catalysis research that integrates LLM-driven planning with a structured, multi-fidelity tool library.
- It employs a hierarchical agent framework—Planner, Executor, and Summarizer—to automatically decompose natural language requests into reproducible, file-centric calculation workspaces.
- The system optimizes DFT pipelines and surrogate modeling, reducing manual interventions and enhancing reproducibility through robust error handling and persistent project records.
CatMaster is an agentic, autonomous system designed to automate and manage end-to-end computational heterogeneous catalysis research workflows. By integrating LLM-driven planning and orchestration with a structured multi-fidelity tool library, CatMaster operationalizes natural language requests into reproducible calculation workspaces and maintains a detailed, persistent record of project state and results. The architecture is optimized for density functional theory (DFT)–centered pipelines, rapid surrogate models, and domain-specific error handling, significantly reducing manual intervention in high-throughput and complex catalytic screening studies (Chen et al., 20 Jan 2026).
1. Agentic System Architecture and File-Centric Workflow
CatMaster employs a hierarchical agent framework involving three core LLM-driven components:
- Planner: Decomposes user prompts into an ordered sequence of "milestone" tasks, each defined by a minimal evidence-package contract—inputs to generate, outputs to produce, and key scalars to extract.
- Executor: Invokes domain-specific tool APIs to manipulate atomistic structures, generate calculation inputs, run simulations, and parse outputs. Micro-decisions (e.g., k-point mesh selection, convergence thresholds) are made automatically by internal schemas specified for each task type.
- Summarizer: Emits explicit whiteboard updates in the form of UPSERT/DEPRECATE operations that record facts, constraints, file pointers, and open questions after each task.
A file-centric execution contract ensures that each milestone generates a self-contained directory (including relevant input/output files and a JSON summary of key scalars). The persistent "project record"—referred to as the whiteboard—tracks:
- Facts (e.g., total energies)
- Constraints (e.g., dynamic freeze masks)
- Absolute file paths
- Task statuses and dependency graphs
This supports reliable inspection, pausing, checkpointing, and resuming on remote high-performance computing (HPC) resources. Automatic job re-submission, file-lock recovery, and input/output retries provide baseline robustness. If repeated failures occur (e.g., three consecutive SCF non-convergences), CatMaster escalates to a human-in-the-loop checkpoint, providing evidence packages for user-driven intervention. All tool calls are schema-validated, with misparameterizations triggering format-repair or plan-repair LLM prompts (Chen et al., 20 Jan 2026).
2. Multi-Fidelity Tool Library
CatMaster exposes a tool layer organized into six validated, schema-bound categories:
| Category | Example Tools | Primary Functions |
|---|---|---|
| Retrieval | materials_project_get_bulk | Bulk structure queries via pymatgen |
| Construction | relax_prepare, slab_build | VASP input generation, slab modeling |
| Adsorption | site_enumeration, | Adsorption site enumeration, docking |
| place_adsorbate | ||
| Simulation (DFT) | vasp_execute_batch | SLURM/PBS/LSF submission, PBE/GGA |
| Screening (Surrogate) | mace_relax | MACE-MPA-0 GNN for fast relaxation |
| Custom Logic | python_execute | Custom scripts, EOS fitting, parsing |
All tools accept and return structured JSON documents. Key DFT simulation parameters rely on VASP defaults (GGA-PBE, ENCUT=520 eV, EDIFF=10⁻⁶ eV, EDIFFG=–0.02 eV/Å), with support for optional DFT-D3 dispersion and collinear spin polarization. Surrogate relaxations use pretrained MACE-MPA-0 networks. Tool outputs include both traditional files (POSCAR, INCAR, OUTCAR, etc.) and standardized summaries for whiteboard integration (Chen et al., 20 Jan 2026).
3. Demonstrations and Case Studies
CatMaster's capabilities have been demonstrated in several prototypical workflows:
3.1 O₂ Spin-State Check: Constructs O₂ from SMILES, relaxes both singlet and triplet species with appropriate INCAR overrides, parses total energies and bond lengths, and calculates ΔE and d_O–O.
3.2 BCC Fe Surface Energies and CO Adsorption:
- Computes surface energies for Fe(110), (100), (111) via symmetric slabs.
- Equation:
- Validates protocol sensitivity (center-fixed slabs, DFT-D3 corrections) and demonstrates whiteboard-based deferred resolution to propagate stable facet references across the workspace.
- Calculates CO adsorption energies:
3.3 Multi-Fidelity Pt–Ni–Cu Alloy Screening for HER:
- Database pipeline: query stable compositions (Eₕᵤₗₗ < 0.05 eV), enumerate slab and adsorption sites, surrogate rank ΔG_H*:
- Top candidates validated by high-fidelity DFT, achieving surrogate errors between +0.033 and –0.149 eV.
3.4 Long-Tail Tasks: Supports complex, compositional workflows including equation-of-state (EOS) fitting by batch-running static VASP jobs and fitting to Birch–Murnaghan form, and single-atom catalyst preparation on functionalized graphene using both MACE surrogate relaxation and site-specific adsorption modeling (Chen et al., 20 Jan 2026).
4. Reproducibility, Data Management, and Metadata
Each milestone produces an evidence package encompassing:
- All input files (e.g., POSCAR, INCAR, KPOINTS, POTCAR)
- Outputs (CONTCAR, OUTCAR, vasp.log, etc.)
- Scalar JSON summaries (energies, forces, ΔG values)
- Schema-validated whiteboard JSON entries tracking:
- Task identity and tool
- Inputs and outputs (with file paths)
- Status and timestamps
- Key results
The comprehensive project record allows local or remote resumption, archiving, and reproducibility inspection. Metadata schemas are strictly enforced for both tool-level inputs/outputs and overarching workspace state, with automated validation and repair mechanisms. Final reporting collates benchmark metrics and evidence package pointers for each milestone (Chen et al., 20 Jan 2026).
5. Performance Characteristics and Limitations
Performance benchmarks indicate:
- Surface energies computed within 3% of reference literature values.
- HER descriptor surrogate errors as low as +0.033 eV; surrogate pre-screening reduces required DFT calls by ∼90× over exhaustive high-fidelity runs.
- Eliminates >500 lines of workflow- and bookkeeping-related code per project and obviates manual tracking of >200 intermediate files.
- Achieves ∼3× reduction in wall-clock project time, accounting for HPC queueing delays.
Limitations include the absence of literature-grounded method selection ("scientific reasoning gap"), linear planning that precludes mid-run workflow restructuring (DAG-based orchestration is future work), the risk of generative errors in geometry/protocols (subject to post-hoc graph-connectivity checks), and restricted depth in error handling (e.g., only routine SCF/mixing errors are handled natively; persistent VASP failures require specialized debugging agents). Future improvements target the integration of retrieval-augmented generation (RAG) and richer, non-linear orchestration for adaptive workflows (Chen et al., 20 Jan 2026).
6. Context and Significance for Catalysis Research
CatMaster addresses core reproducibility, bookkeeping, and human error factors intrinsic to computational heterogeneous catalysis workflows. By enforcing a file-centric, evidence-based project record and automating both routine input generation and high-throughput screening, the system enables domain scientists to concentrate on modeling choices and mechanistic interpretation rather than workflow logistics. This approach establishes a blueprint for LLM-agent orchestration in computational chemistry and related data-driven materials discovery (Chen et al., 20 Jan 2026).