Equivariant Mesh Attention Networks (EMAN)
- EMAN are deep learning architectures that guarantee exact equivariance to global rotations, translations, uniform scalings, node permutations, and local gauge (SO(2)) transformations.
- They leverage relative tangential features and gauge-equivariant attention layers to achieve state-of-the-art performance on benchmarks like FAUST and TOSCA, ensuring robustness to both local and global geometric transformations.
- By extending SE(3)-Transformers with mesh-specific symmetry considerations and local gauge invariance, EMAN improves convergence speed and reduces reliance on data augmentation while maintaining high accuracy.
Equivariant Mesh Attention Networks (EMAN) are deep learning architectures specifically designed for mesh data that integrate attention mechanisms with exact equivariance to a broad class of symmetry transformations. EMANs are formulated to be provably equivariant to global rotations, translations, uniform scalings, node permutations, and local gauge (frame) transformations. By leveraging relative tangential features as input and constraining attention layers to respect mesh symmetries, EMANs achieve state-of-the-art performance on standard mesh-processing benchmarks, while offering robustness to both local and global geometric transformations (Basu et al., 2022).
1. Mathematical Foundations and Group Actions
EMANs formalize symmetry through group actions on mesh data, considering the product of the following symmetry groups: global rotations (SO(3)), translations, uniform scalings, node permutations, and per-node gauge (SO(2)) transformations. For a mesh with vertices , the principal equivariance relations for a generic feature field are:
- Rotation: .
- Translation: .
- Scaling: .
- Permutation: relabeling of vertex indices preserved.
- Gauge (frame) transformation: at each , a rotation of the local tangent plane; a feature of type transforms as .
The EMAN mapping satisfies for all elements in the product group, ensuring equivariance/invariance to all symmetries under consideration (Basu et al., 2022).
2. Relative Tangential Features (RelTan)
Raw 3D coordinates are not inherently equivariant to the desired group actions. EMANs utilize relative tangential (RelTan) features, defined at each vertex as follows: where is the 1-ring neighborhood, is the area-weighted normal, and is the projector onto the tangent plane at . The parameter controls the weighting by neighbor distance. According to Lemma 3.1, these features are equivariant to rotations (), and invariant to translations and uniform scalings (, ) (Basu et al., 2022).
RelTan features enable the network to see multi-scale geometric relations using different values, substantially improving equivariance properties versus raw coordinates.
3. Gauge-Equivariant Mesh Attention Layer
At the core of EMAN is a gauge-equivariant attention mechanism. Each EMAN layer applies attention across vertex neighborhoods with the following structure:
- Queries:
- Keys/Values: For each neighbor ,
where is the angle of the edge in the local tangent frame, and is the discrete parallel transport angle for local gauge transport.
- Attention coefficients: .
- Aggregation: Output at is .
Kernels are constrained by gauge compatibility: ensuring transformation compatibility under local frame changes (Basu et al., 2022).
4. Equivariance Guarantees and Theoretical Properties
The EMAN architecture, by design, achieves the following exact properties:
- Gauge equivariance: Under gauge rotation , queries, keys, and values transform by corresponding irreducible representations , , orthogonality of which ensures invariance of softmax normalization and thus of output type.
- Global rotation equivariance: Rigid rotations preserve all local angles and parallel transport, so outputs rotate accordingly: .
- Translation/scaling invariance: The parametrization is independent of absolute vertex position or scale, as only angular information enters the kernels.
- Permutation equivariance: Aggregations over neighborhoods are permutation-invariant by symmetry of the sum.
The full architecture, using RelTan features as inputs and appropriate angular biases in the attention layer, guarantees strong equivariance and invariance properties as formalized in Theorem 5.3 (Basu et al., 2022).
5. Empirical Evaluation and Benchmark Performance
Extensive validation is provided through experiments on the FAUST (node-wise segmentation, 15 body-part labels) and TOSCA (mesh classification, 9 shape classes) benchmarks. EMAN is compared to:
- SpiralNet++ (graph convolution on raw coordinates)
- GEM-CNN (gauge-equivariant convolution; variants with raw coordinates, GET, or RelTan input)
- EMAN (attention on raw coordinates, GET, or RelTan input)
Key metrics are per-node or per-shape accuracy, both on clean data and under randomly-applied global rotation, translation, scaling, permutation, and gauge transformations at test time. The following table summarizes segmentation accuracy on FAUST:
| Model/Input | Clean (%) | Transformed (%) |
|---|---|---|
| GEM-CNN (XYZ) | 97.9 | 12.5 |
| GEM-CNN (RelTan) | 98.6 | 98.6 |
| EMAN (XYZ) | 98.5 | 0.0 |
| EMAN (RelTan) | 98.7 | 98.7 |
On TOSCA, EMAN + RelTan achieves 98% accuracy and is robust to all global transforms. Only RelTan features guarantee full transformation robustness.
These results demonstrate that EMAN, particularly in combination with RelTan features, achieves or exceeds state-of-the-art performance and is insensitive to a wide range of perturbations, with no need for data augmentation (Basu et al., 2022).
6. Robustness, Computational Complexity, and Limitations
Exact equivariance in EMAN ensures robustness to geometric nuisance transformations and, as a result, eliminates the need for pre-training augmentation. EMAN converges faster and generalizes better when trained without augmentation compared to prior graph-based or convolutional mesh models.
In terms of computational efficiency, per-epoch wall-time is roughly doubled relative to GEM-CNN due to the extra operations in the attention module. However, EMAN's improved sample-efficiency enables it to surpass GEM-CNN accuracy within equivalent training durations (e.g., 30 minutes on FAUST).
Documented limitations include increased constant-factor costs in runtime and memory, the restriction of non-linearities to gauge-equivariant forms (forbidding ReLU on non-scalar channels), and a potential for greater sensitivity to unnormalized raw coordinates—as mitigated by RelTan feature preprocessing. No self-contribution (“self-node”) term is included in aggregation by default; extensions are possible (Basu et al., 2022).
7. Relationship to SE(3)-Transformers and Broader Context
EMAN extends the theoretical and algorithmic framework of SE(3)-Transformers (Fuchs et al., 2020), which provided attention-based models for point clouds and graphs with SE(3) equivariance. The key advances in EMAN include adaptation to mesh connectivity, incorporation of local gauge symmetry (SO(2)) on tangent-plane features, and use of RelTan features for guaranteed invariance to translation and scaling.
Both architectures rely on irreducible representations, Clebsch–Gordan decomposition, and equivariant dot-product attention, but EMAN integrates these concepts into mesh processing by careful consideration of mesh-specific angular parameters, gauge transport, and mesh-driven neighbor aggregation (Basu et al., 2022, Fuchs et al., 2020).
By merging irreducible SO(3) structures, gauge-theoretic convolutions, and mesh-specific attention, EMAN achieves a comprehensive equivariant framework that is broadly applicable to geometric learning problems in physics, computer vision, and chemistry.