SE(3)-Equivariant Tensor Field Networks
- SE(3)-Equivariant Tensor Field Networks are geometric deep learning models that enforce equivariance to 3D rotations and translations via tensor representations.
- They employ a dual-scale flow matching framework that decomposes molecular structures into coarse and all-atom representations, enhancing computational efficiency and accuracy.
- The architecture integrates spherical harmonics and learned tensor operations to maintain symmetry compliance, enabling high-fidelity 3D data generation for applications in molecular physics and computer vision.
An SE(3)-equivariant Tensor Field Network (TFN) is a geometric deep learning architecture designed to model and generate 3D data while guaranteeing equivariance with respect to the SE(3) group of rigid-body transformations, comprising all 3D rotations (SO(3)) and translations (ℝ³). Such networks leverage the TFN paradigm to ensure that operations on coordinates (and associated tensors) commute with global Euclidean transformations, which is essential for molecular physics, chemistry, 3D computer vision, and related applications where canonical orientation or placement in space is arbitrary.
1. Geometric Equivariance and Tensor Field Networks
SE(3) equivariance is the property whereby, for a neural operator , acting on input data transformed as (where , ), the output transforms as . TFNs leverage this by representing features as geometric tensors (e.g., scalars, vectors, higher-order tensors) and using learned operations (including kernel convolutions and spherical harmonics) that are strictly equivariant under SE(3).
In practice, nodes in the underlying molecular or point-cloud graph carry both invariant features (e.g., atom type, bond context) and equivariant features (coordinates, velocities). Edges encode pairwise relationships such as bond type or spatial distance.
2. Dual-Scale Flow Matching Framework
TFNs are central components in dual-scale flow matching frameworks for generative modeling of 3D structures, most notably in the generation of molecular clusters (Subramanian et al., 2024). The state space is decomposed into a coarse-grained (CG) representation, with M beads, and an all-atom (AA) representation, with N atoms. This two-stage approach exploits hierarchies in molecular structure:
- The CG flow:
- (bead coordinates)
- Coarse potential
- Objective: Learn via flow matching from a simple prior to
- The AA flow:
- (atom coordinates)
- Full potential
- Conditional on the generated CG configuration, learn to sample
Both flows are modeled by SE(3)-equivariant TFNs, ensuring physical symmetry at both levels.
3. Mathematical Formulation and Loss Functions
The flow-matching objective corresponds to simulating a stochastic process along a continuous path from a simple prior to the data distribution. The core supervised regression target at each timepoint t is the known velocity field between initial and target configurations:
where .
For dual-scale flows:
- CG flow: as above on beads
- AA flow: conditioned on
The flows are solved via explicit ODE integration; TFN architectures parameterize the time-dependent velocity fields.
4. Architectural Characteristics and Equivariance Enforcement
The SE(3)-equivariant TFN backbone operates on molecular or point-cloud graphs with the following essentials:
- Nodes: Carry both 3D coordinates (equivariant) and feature vectors (invariant, e.g., atom type, aromaticity).
- Edges: Encode relationships such as bond type (invariant).
- Layers: Employ learned spherical harmonics and tensor products to ensure that for any rigid transformation , the network satisfies
and, in the AA stage, .
- Alternatives: Benchmarked backbones include E(3)-GNN and Attentive FP; TFN yields superior Jensen–Shannon divergence for physically realistic distributions.
5. Training and Inference Pipeline
Training proceeds in two decoupled stages:
- CG flow training: Minimize via stochastic sampling of CG configurations.
- AA flow training: Minimize , conditioning on ground-truth bead coordinates. This separation allows for efficient learning and wall-clock speedups: most ODE integration steps are performed on the much smaller CG system.
At inference, a CG sample is first drawn and integrated via the CG flow. The resulting bead coordinates condition the AA flow, generating the full atomistic sample efficiently and equivariantly.
6. Empirical Results and Computational Advantages
Dual-scale SE(3)-equivariant TFN flow matching achieves substantial gains in both fidelity and computational cost over single-scale or non-equivariant alternatives (Subramanian et al., 2024):
| Method | Bond JSD ↓ | Angle JSD ↓ | Time/step (s) ↓ |
|---|---|---|---|
| Single-scale (Gaussian) | 0.6563 | 0.6316 | 0.2949 |
| Single-scale (Harmonic) | 0.6298 | 0.6066 | 0.3039 |
| Dual-scale (30:10 split) | 0.5472 | 0.4610 | 0.0496 |
Increasing the proportion of CG steps further decreases inference time with negligible fidelity loss. This efficiency arises from performing most ODE integration at the coarse level (M ≪ N), while SE(3)-equivariance preserves physically correct generation.
7. Relevance and Context within SE(3)-Equivariant Modeling
SE(3)-equivariant TFNs have become the de facto backbone for generative and discriminative geometric learning tasks where data is physically non-oriented and translation-invariant. The dual-scale framework, as operationalized for molecular sampling, enables accurate, efficient simulations that are unattainable by single-scale or non-equivariant methods. Direct enforcement of SE(3)-equivariance via TFN layers ensures compliance with conservation laws and indistinguishability under global rigid motions—a strict requirement in molecular, physical, and some 3D perception applications. TFN-equipped flow matching methods set a new standard for generative 3D modeling in these domains (Subramanian et al., 2024).