AutoEFT: Automated EFT Operator Generation

Updated 17 December 2025

AutoEFT is a fully automated framework that constructs complete, minimal on-shell operator bases for effective field theories by integrating group-theory and tensor-algebra algorithms.
It systematically removes redundancies using equations of motion, integration-by-parts, and Fierz/Schouten identities to ensure only unique Lorentz and gauge-invariant interactions are included.
The framework integrates with transformer-based models to expedite the generation and verification of candidate operators for applications in SMEFT, BSM, and gravitational theories.

AutoEFT is a fully automated framework for constructing complete, minimal on-shell operator bases for effective field theories (EFTs), given arbitrary field content and symmetry groups. It augments classical group- and tensor-theory algorithms with codified removal of all operator redundancies from equations of motion (EoM), integration-by-parts (IBP) identities, Fierz and Schouten-type relations, and field relabeling symmetries. AutoEFT operates on the on-shell Hilbert space, systematically enumerating all permitted Lorentz and gauge-invariant interactions at a specified operator dimension, and outputs the result in human- and machine-readable form, streamlining EFT basis construction for the Standard Model, its extensions, and beyond (Schaaf, 2023, Harlander et al., 2023).

1. Mathematical Foundations and Redundancy Removal

AutoEFT is predicated on the construction of operator bases without over-generation and post-hoc redundancy elimination. From the outset, only Lorentz- and gauge-invariant tensor structures not reducible by canonical identities are included.

Equations of Motion (EoM): Operators proportional to classical EoMs, such as $\delta S/\delta\Phi=0$ , are physically redundant on shell and removed by construction. For gauge fields, any structure involving total derivatives like $\partial^\mu F_{\mu\nu}-J_\nu=0$ is excluded.
Integration by Parts (IBP): Total derivatives, $\int d^4x\,\partial_\mu\mathcal{O}^\mu(x) = 0$ , do not affect $S$ -matrix elements and are pruned by restricting the placement of derivatives in operator generation.
Fierz and Schouten Identities: Bilinear and multilinear spinor products related by Fierz transformations are collapsed to canonical representatives. For example:

$(\bar\psi_1\gamma^\mu\psi_2) (\bar\psi_3\gamma_\mu\psi_4) = -2(\bar\psi_1\psi_4)(\bar\psi_3\psi_2) + \dots$

Similarly, SU( $N$ ) group-theoretical redundancies (e.g., Schouten for SU(2): $\epsilon^{ij}\epsilon^{kl}+\epsilon^{ik}\epsilon^{lj}+\epsilon^{il}\epsilon^{jk}=0$ ) are enforced to avoid duplicate invariants (Schaaf, 2023).

Repeated-Field Redundancy: For operators with multiple identical fields, permutation symmetries (field relabeling) are quotiented out after generating a "super-basis" treating fields as distinct.

2. Core Algorithm and Implementation

The central AutoEFT workflow follows a deterministic, pruning-centric approach:

Model Parsing: The user supplies a YAML model file specifying the field content (spin, Lorentz/gauge reps, hypercharges, number of generations) and symmetry groups (arbitrary SU( $N$ ), U(1), local/global).
Basis Generation:
- Enumerate all multi-sets of fields and derivatives such that the total mass dimension matches the target.
- For each such "family," build all Lorentz-invariant tensors (e.g., via Young tableau and epsilon/trace contractions) and internal symmetry tensors.
Redundancy Elimination:
- Prune structures forbidden by EoM/IBP at the time of monomial creation.
- Apply Fierz/Schouten reductions for spinor and group-theoretical cases.
- Implement symmetric group decompositions to identify and mod out field permutation symmetries (using precomputed S $_n$ generators for $n\leq 9$ ).
Export and Formatting: The resulting operator basis is written in YAML, JSON, and LaTeX forms, including explicit index contractions and permutation symmetries (Harlander et al., 2023).

Pseudocode summary:

def AutoEFT_Basis(dimension d, ModelFile model):
    fields, symmetry_groups = ParseModel(model)
    families = []
    for multiset F of fields with sum(dim) ≤ d:
        for each way to assign #∂ to F with sum(dim)+#∂=d:
            if EoM_forbidden_pattern(F, ∂-assignment): continue
            if IBP_total_derivative(F, ∂-assignment): continue
            families.append((F, ∂-assignment))
    for fam in families:
        T_Lor = FindLorentzTensors(fam)
        T_Int = FindGroupInvariants(fam, symmetry_groups)
    invariants = { contract(L, I) | L in T_Lor, I in T_Int }
    phys_basis = QuotientByFieldPermutations(invariants)
    return phys_basis

(Schaaf, 2023, Harlander et al., 2023)

3. Supported Theories, Input/Output, and Extensions

AutoEFT supports an extensive range of EFTs as defined by user-supplied fields and symmetry groups:

Fields: Scalars, Weyl and Dirac spinors, gauge bosons (Lorentz $\leq$ spin-2, including Weyl and Riemann tensors for gravity).
Symmetries: SU( $N$ ), U(1) factors, local or global; extensions to SO( $N$ ), Sp( $N$ ) possible via invariant tensor implementation.

Sample Input File: Basic YAML with fields and symmetries:

name: SMEFT
symmetries:
  u1_groups: {U1_Y: {}}
  sun_groups: {SU3_C: {N: 3}, SU2_L: {N: 2}}
fields:
  QL: {representations: {Lorentz:[1/2,0], SU3_C:[1], SU2_L:[1]}, generations: 3}
  H: {representations: {Lorentz:[0], SU2_L:[1]}, tex: H}
  # Additional fields as needed

Outputs: Per-dimension operator catalog (YAML/JSON), LaTeX for each operator, machine-readable basis structure. For example, the dimension-5 Weinberg operator for neutrino mass is output in both YAML and LaTeX:

$\mathcal{O}_\nu = \epsilon^{\alpha\beta}\,\epsilon^{ij} (L_{i\,\alpha}^w\,L_{j\,\beta}^x) (H_k\,H_l)\,\epsilon^{kl}$

(Harlander et al., 2023)

Extensibility: Adding new fields, representations, or symmetry groups requires only modification of the model file. AutoEFT has been used for the Standard Model Effective Field Theory (SMEFT), SMEFT plus gravity (GRSMEFT), Minimal Flavor Violation (MFV) extensions, QED, QCD, and numerous BSM cases.

4. Algorithmic Performance and Scaling

Scalability: Operator enumeration grows exponentially with mass dimension ( $d$ ). SMEFT at $d=10$ yields $\mathcal{O}(10^4)$ operators, $d=12$ approaches $\mathcal{O}(10^5)$ . The practical limit is set by computer memory and disk rather than algorithmic overhead.
Computation Time: Bases up to $d=10$ are computed within hours on a multi-core CPU; $d=12$ can require weeks to months. Efficient on-the-fly pruning of EoM, IBP, and Fierz redundancies prevents the combinatorial explosion characteristic of "pre-basis $\to$ reduction" methods.
Limits: Generation is limited to on-shell operators; evanescent and gauge-variant counterterms are not included. For $n>9$ identical fields, explicit S $_n$ generators must be provided. Each operator must involve at least three fields (Harlander et al., 2023).

Mass dimension $d$	Operators (SMEFT, $n=3$ , est.)	Compute time
6	$\mathcal{O}(10^3)$	Minutes (desktop)
10	$\mathcal{O}(10^4)$	Hours–day (multicore)
12	$\mathcal{O}(10^5)$	Weeks–months (high memory)

5. Integration with Machine Learning: Transformer-Based AutoEFT

Recent work has demonstrated the use of LLMs based on transformers to automate Lagrangian generation given arbitrary field content, using tokenized object representations that encode spin, group representations, and charge assignments (Koay et al., 16 Jan 2025). In this context, AutoEFT serves as the redundancy-removal "oracle" in a hybrid pipeline:

Data Pipeline: Autogenerate EFT interaction terms up to a fixed operator dimension using classical AutoEFT, translating field content to token sequences.
Transformer Model: BART-style encoder/decoder (12 layers each, 16 heads, hidden dim 1024) trained on tens of $10^5$ generated Lagrangians, achieving $>$ 90% sequence-level accuracy for up to six fields.
Embedding Analysis: Input embedding vectors cluster according to spin, gauge representation, and charge. Conjugation is internally encoded as a vector direction in embedding space, indicating the model internalizes crucial group-theoretical invariants.
Hybrid Pipeline: User-provided field content is tokenized, and the trained model outputs candidate invariant terms, which are then algebraically processed by AutoEFT to remove redundant operators. This supports generalization and rapid basis construction, with LaTeX/Wolfram/Sympy exports (Koay et al., 16 Jan 2025).

6. Applications, Limitations, and Directions

Applications: AutoEFT is deployed for operator basis construction in SMEFT, BSM scenarios, gravity extensions, and flavor theories. Canonical operator bases (e.g., Warsaw, SILH) can be selected post-generation. Algorithmic stratification enables calculations relevant to LHC, flavor factories, and gravitational effective field theory.
Limitations: The operator count and memory scale exponentially with operator mass dimension and number of flavors. Only on-shell, gauge-invariant, and $n\geq3$ field operators are generated; evanescent operators and off-shell counterterms are excluded. Custom group extensions (e.g., SO( $N$ ), Sp( $N$ )) require additional coding.
Potential Extensions: Planned developments include support for alternative symmetry groups, direct interface to one-loop matching algorithms, inclusion of spurion insertions for higher-order MFV, and translation between operator bases. Extension of the framework to generate evanescent and gauge-variant structures for full renormalization is an open target (Harlander et al., 2023).

7. Summary

AutoEFT provides a unified, redundancy-avoiding, on-shell operator generation framework for effective field theories. By integrating algebraic and group-theoretical logic in both classical (symbolic) and machine learning (transformer) architectures, it automates a previously labor-intensive workflow fundamental to BSM, flavor, and gravitational model building. Operator bases for the SMEFT and its extensions, including gravity, are routinely constructed to high-mass dimension in hours to weeks on modern hardware. Open-source implementation and extensibility to new models and symmetries make AutoEFT a standard computational tool for the EFT physics community (Schaaf, 2023, Harlander et al., 2023, Koay et al., 16 Jan 2025).