HEPTAPOD: Agentic HEP Automation Toolkit
- HEPTAPOD is an orchestration framework that automates high-energy physics simulation and analysis by leveraging LLM-driven agentic planning.
- It employs schema-validated tool registries, structured run-card management, and sandboxed execution to enhance reproducibility and transparency.
- The system integrates multi-step processes from symbolic model generation to jet clustering, ensuring auditable, human-in-the-loop workflow control.
The HEP Toolkit for Agentic Planning, Orchestration, and Deployment (HEPTAPOD) is an orchestration framework designed to enable agentic, reproducible, and auditable automation of high-energy physics (HEP) simulation and analysis workflows through LLM–driven planning and tool use. HEPTAPOD provides a structured interface for LLMs to coordinate complex, multi-step HEP pipelines—spanning symbolic model generation, Monte Carlo simulation, and downstream data analysis—by integrating schema-validated tool invocation, structured context management, and run-card–driven configuration. It establishes a transparent, human-in-the-loop foundation for modernizing HEP workflow execution and tracking (Menzo et al., 17 Dec 2025).
1. Design Motivation and Objectives
Contemporary HEP pipelines, despite relying on a mature suite of upstream scientific packages (FeynRules, MadGraph, Pythia, FastJet, etc.), require significant manual orchestration. Researchers typically write and maintain extensive scripts or shell pipelines to propagate model parameters, edit and manage run cards, interpret heterogeneous output, debug opaque failures, and synchronize multi-stage data flows. Traditional approaches suffer from fragile string manipulation and ad hoc bookkeeping, impeding reproducibility and scalability.
Recent advancements in transformer-based LLMs have produced models capable of “agentic” reasoning: planning multi-step tasks, calling tools via structured API protocols, maintaining dialogue and workflow state, and iteratively retrying on error. HEPTAPOD leverages these agentic capabilities to automate and coordinate HEP pipelines while enforcing:
- Reproducibility, via versioned run cards, machine-readable manifests, and conversational logs;
- Transparency, through schema-validated APIs and structured tool outputs;
- Human-in-the-loop control, with explicit checkpoints for researcher approval.
The principal objective is to allow an LLM to orchestrate a sequence of HEP simulation and analysis steps—from Lagrangian definition through event-level data analysis—while ensuring all workflow logic remains auditable and robust to upstream or downstream changes (Menzo et al., 17 Dec 2025).
2. System Architecture
HEPTAPOD’s architecture comprises three interconnected layers, all managed by the Orchestral AI orchestration engine:
- Schema-Validated Tool Registry
A Python library wraps individual HEP codes (e.g., FeynRules, MG5_aMC@NLO, Pythia, FastJet) as “tools.” Each tool is defined by a Python class, with explicit type annotations that are automatically translated into JSON schemas for allowed inputs and outputs. Tool docstrings encode domain semantics. This registry ensures consistent, schema-constrained interaction between agent and computational resources without exposing internal scripts.
- Agent-Orchestration Layer
The orchestration layer presents the LLM (e.g., GPT-OSS-120B) with: - A full conversational log, - The JSON schema plus docstring (“semantic prompt”) for each available tool, - A system prompt enforcing domain-specific constraints (e.g., “never fabricate physics,” “use run cards as canonical configuration”).
The LLM reasons over its current state, optionally emits a tool call (as structured JSON), which is then validated and dispatched by the orchestration engine. All returned outputs (filenames, cross sections, event metadata) are injected as structured JSON into the ongoing conversation, enabling memory-efficient iterative planning.
- Sandboxed Execution Engine
All tool invocations occur in isolated workspaces (“sandboxes”). The engine handles runtime file management, placeholder resolution in run cards (e.g., [[LHEF_PATH]]), stdout/stderr capture, and conversion of exit outcomes to structured JSON success or error objects. Every event and output is appended to a persistent provenance log.
The closed-loop workflow operates as:
- LLM reasons over context;
- Emits tool call (validated);
- Engine executes tool in sandbox;
- Returns structured JSON output;
- Appends output to session context;
- LLM reasons again, repeating as needed.
3. LLM–Tool Integration
HEPTAPOD exposes tools to the LLM using an OpenAI-style function-calling API, where each tool is described by its JSON schema (auto-generated from type annotations) along with a semantic docstring. When an agent determines, for example, that a UFO model should be generated, it issues:
1 2 3 4 5 6 7 |
{
"name": "FeynRulesToUFOTool",
"arguments": {
"model_path": "feynrules/models/S1_LQ_RR.fr",
"output_dir": "feynrules/models/S1_LQ_RR_UFO"
}
} |
The orchestrator validates this JSON against the corresponding schema, executes the underlying tool, and returns results as structured objects:
1 2 3 4 5 |
{
"ok": true,
"output_dir": "feynrules/models/S1_LQ_RR_UFO",
"files_created": [ "...couplings.py", "...vertices.py" ]
} |
The agent iteratively builds context: after each tool call, new JSON fields (such as run IDs, cross sections, event counts) are injected into the session, allowing further planning (e.g., dispatching a Pythia job).
Run cards provide domain-tailored configuration templates, frequently containing placeholders (e.g., [[UFO_PATH]], [[NEVENTS]]). When the agent provides necessary runtime values, the relevant tool substitutes in the concrete paths/parameters prior to execution. This system eliminates reliance on brittle string mutation routines typical of legacy scripts (Menzo et al., 17 Dec 2025).
4. Schema-Validated Operations and Run-Card Management
All tool classes in HEPTAPOD inherit from a common BaseTool and distinguish between “RuntimeField” (agent-settable at call time) and “StateField” (injected by the orchestrator, not visible to the agent). For illustration:
1 2 3 4 5 6 7 |
class PythiaFromRunCardTool(BaseTool): command_card: str = RuntimeField(description="Path to Pythia8 .cmnd") nevents: int = RuntimeField(description="Number of events") seed: Optional[int] = RuntimeField(...) # State fields pythia_exe: str = StateField(...) |
The orchestration engine auto-generates a corresponding JSON schema:
1 2 3 4 5 6 7 8 9 10 11 12 |
{
"name": "PythiaFromRunCardTool",
"parameters": {
"type": "object",
"properties": {
"command_card": { "type": "string" },
"nevents": { "type": "integer" },
"seed": { "type": ["integer", "null"] }
},
"required": ["command_card", "nevents"]
}
} |
Canonical configuration files (e.g., MadGraph’s “.mg5”, Pythia’s “.cmnd”) serve as templates. The agent need only specify physics-relevant fields (masses, couplings, seed values), and the orchestrator ensures correct substitution in the appropriate slots:
1 2 3 4 5 6 |
import model [[UFO_PATH]]
generate p p > S1 S1~, (S1> e- u), (S1~> e+ u~)
output S1_LQ_scan
launch
set nevents [[NEVENTS]]
set seed [[SEED]] |
Tool invocations are schema-validated before runtime via jsonschema.validate(agent_call, tool_schema), guaranteeing that only well-formed, properly-typed, and fully-specified configurations ever reach the corresponding executables.
5. Agentic BSM Monte Carlo Pipeline Example
A detailed benchmark is provided in the form of a BSM leptoquark production and analysis workflow. The process, as orchestrated by a HEPTAPOD agent, includes:
- Model Definition: The scalar leptoquark is specified via its Lagrangian,
with diagonal Yukawa couplings .
- End-to-End Workflow:
- UFO model generation with FeynRules (
FeynRulesToUFOTool). - Parameter scan over TeV via MadGraph with tool return such as:
1 2 3 4 5 6 7 8
{ "scan_detected": true, "n_runs": 3, "runs": [ {"run_id": "run_01", "scan_params": {"mass#9000005": 1000.0}, "lhe_file": ".../run_01.lhe.gz", "cross_section": 0.1017 } // ... ] } - Hadronization with Pythia (
PythiaFromRunCardTool) for each scan point. - LHE→JSONL event conversion (
evtjsonl-1.0). - Jet clustering (Anti-, ) via
JetClusterSlowJetTool. - Selection of leading leptons/jets with
FilterByPDGIDToolandGetHardestN[Jets]Tool. - Resonance reconstruction using leptons and jets:
via
ResonanceReconstructionTool, outputting , arrays and histograms. - Plotting via a Python/Matplotlib tool.
The agent maintains and updates a structured “to-do” task list, marking items complete as tool invocations succeed:
1 2 3 4 5 6 7 8 9 |
## Monte Carlo Signal-Validation for S1 Leptoquark
- [*] Generate UFO model from FeynRules
- [ ] Generate parton-level events (MG5) with nevents=10000
- [ ] Shower & hadronize (Pythia) for each run
- [ ] Cluster jets (R=0.4)
- [ ] Select hardest 2 leptons (PDG±11,±13) & hardest 2 jets
- [ ] Compute m^{min}_{LQ} and m^{max}_{LQ}
- [ ] Plot histograms of m^{min}_{LQ} for all mass points
- [ ] Summarize cross sections & event counts |
6. Mechanisms for Reproducibility and Provenance
All tool calls and their outcomes are serialized into a persistent provenance log, typically as a JSONL session trace. This ensures every decision and outcome (including errors) can be reconstructed post hoc.
Explicit “human approval” breakpoints are supported: for example, after a new run card is generated, the agent pauses for researcher review. All error information is structured (never a raw stack trace):
1 2 3 4 5 6 7 |
{
"status": "error",
"code": "FileNotFound",
"message": "...",
"tool": "JetClusterSlowJetTool",
"missing_path": ".../jets.jsonl"
} |
This structure permits the agent to inspect, reason about, and repair failed steps rather than terminating. Final outputs, such as cross section tables, event counts, and data file paths, are consistently exported in machine-readable formats alongside visual products.
7. Performance and Scalability Considerations
HEPTAPOD’s event and object serialization in JSONL format supports streaming and memory-efficient processing of large Monte Carlo datasets (>106 events), allowing transparent pipelining and parallelization. Conversion tools produce fixed-shape NumPy arrays compatible with downstream vectorized ML models (JAX, PyTorch). The orchestration layer is stateless between tool calls, so batch or parallel execution across many parameter points is natively supported.
JSON schema validation introduces negligible computational overhead compared to external tool runtimes. Opportunities for future scaling include multi-agent parallelization (e.g., assigning one planning agent per parameter scan point) and retrieval-augmented prompt engineering for caching and reuse of frequent run-card templates.
HEPTAPOD refactors fragile, manually-scripted HEP workflows, replacing them with robust, agent-driven, schema-grounded, and fully auditable pipelines that tightly couple LLM planning with canonical HEP simulation and analysis software (Menzo et al., 17 Dec 2025).