HIENet: Hybrid Invariant–Equivariant Networks
- HIENet is a neural architecture that explicitly decomposes data representations into invariant and equivariant components under group actions.
- It leverages invariant encoders, equivariant encoders, group predictors, and decoders to separate and reassemble latent codes effectively.
- This design achieves state-of-the-art performance across diverse applications such as 3D vision, materials science, and graph learning while maintaining computational efficiency.
A Hybrid Invariant–Equivariant Network (HIENet) is a neural architecture that explicitly decomposes data representations into invariant and equivariant components under a group action. By jointly leveraging both invariance (features unchanged under group actions) and equivariance (features that transform predictably), HIENets optimize the balance between inductive bias, computational efficiency, and representational expressivity. This design paradigm has been instantiated in unsupervised learning, generative models, self-supervised learning, geometric deep learning, and large-scale materials science models.
1. Mathematical Foundations and Latent Code Decomposition
The foundational principle of HIENet is to factor data representations relative to a group acting on the input space . Given a function and a group representation , equivariance requires
HIENet structures the latent code as a direct product: with enforcing
and predicting the group action. The architecture further introduces an equivariant intermediate map with
and a canonicalization step yielding . The total latent space is thus . This structure enables universal expressivity for -equivariant maps via a provable correspondence between equivariant and (subgroup-)invariant functions (Sannai et al., 2024).
2. Network Architectures and Module Interactions
A typical HIENet instantiates the following modules (Winter et al., 2022, Yan et al., 25 Feb 2025, Garrido et al., 2023):
- Invariant Encoder : Implements ; realized via invariant pooling, DeepSets, or scalar-valued steerable CNN/GNN blocks.
- Equivariant Encoder : Extracts features transforming by ; typically vector/tensor channels in steerable or spherical harmonics GNNs/CNNs.
- Group Predictor : Determines the group element aligning to a standard reference.
- Decoder : Reconstructs a canonical element, to which the pose is reapplied via .
- Final Alignment: The decoded object is transformed using the predicted group element: .
More complex hybridizations interleave pure invariant and equivariant blocks (e.g., scalar-only transformer layers followed by -equivariant message-passing), interspersed with invariant pooling, equivariant broadcasts, or group-canonicalization layers (Sannai et al., 2024, Yan et al., 25 Feb 2025). All components are fully differentiable.
3. Learning Objectives and Theoretical Guarantees
Training in HIENet frameworks is generally unsupervised and enforced via a single reconstruction-driven objective: where is a task-appropriate metric (e.g., for images/point clouds). Group-alignment terms can be included for parameterized groups if weak labels or proxies for are available, but HIENet is designed for situations with no group supervision.
The invariance of and equivariance of ensure decomposition and disentanglement of group-invariant and group-variant factors. For permutation, rotation, translation, and Euclidean groups, explicit constructions of and are provided to guarantee theoretical correctness up to the stabilizer subgroup .
The universal approximation theorem (Sannai et al., 2024) states that HIENets, built from universal invariant subnetworks for each (stabilizer subgroup), are themselves universal for -equivariant maps. The depth is preserved up to one additional layer for the reassembly, and parameter complexity is controlled by the number and size of -orbits and their stabilizers.
4. Explicit Group-Specific Implementations and Instantiations
HIENet construction is group-agnostic and can be tailored as follows (Winter et al., 2022, Garrido et al., 2023, Yan et al., 25 Feb 2025):
- SO(2)/SO(3): Utilize steerable CNNs/GNNs with vector and tensor capsules. outputs (e.g., S1/S2 on the sphere) are orthonormalized, and group action is recovered via argmax/atan2 or SVD.
- SE(n): Separate translation (vector) and rotation (matrix) channels; pose reapplication via rigid transformation.
- S_N (permutations): DeepSets for , permutation-invariant multihead pooling, and group inference via soft-argsort to build doubly stochastic mappings.
- Materials science models: A single invariant message-passing block (G-Transformer) precedes multiple -equivariant message-passing layers; readout preserves energy invariance and force/stress equivariance (Yan et al., 25 Feb 2025).
- Self-supervised 3D vision: Latent spaces are split into and , with invariant and equivariant projection heads and a hypernetwork-based group action on the equivariant branch (Garrido et al., 2023).
5. Experimental Results, Benchmarks, and Efficiency
HIENets have demonstrated state-of-the-art or near-optimal performance across domains:
- Vision (Rotated MNIST, ShapeNet): HIENets with steerable CNN backbones achieve RMSE $0.016$ on upright-to-rotated transfer, with latent-invariant codes yielding digit accuracy vs for classical autoencoders (Winter et al., 2022).
- Sets and Permutations: On large-permutation sets of digits, perfect reconstruction is achieved with invariant latent dimension equal to class count, while equivariant-only models require dependence on (Winter et al., 2022).
- 3D molecules/materials: Interatomic potentials computed with HIENet architectures produce , , outperforming pure-invariant and pure-equivariant models and achieving speedup relative to equivariant-only models (Yan et al., 25 Feb 2025).
- NLP (SCAN): A hybrid network with a group-invariant alignment module and a group-equivariant translator achieves accuracy on "Simple", "Add Jump", and "Around Right" splits—surpassing purely equivariant sequence-to-sequence models (White et al., 2022).
- 3D self-supervised learning: A Split Invariant-Equivariant network achieves rotation and classification top-1 accuracy of on 3DIEBench (Garrido et al., 2023).
Parameter efficiency is generally superior to deep equivariant architectures by exploiting the lower cost of scalar-only operations in invariant blocks and limiting higher-order tensor operations to essential equivariant modules.
6. Applications, Design Considerations, and Future Directions
HIENets have been applied or proposed for:
- Molecular property prediction, where permutation- and rotation-equivariance are essential for chemical validity.
- Fluid simulation, combining scalar (invariant) and vector/tensor (equivariant) fields in graph neural networks for physically accurate long-term dynamics (Shankar et al., 2023).
- Large-scale materials property prediction and discovery, satisfying physical constraints with analytic gradients due to symmetry-aware architectural design (Yan et al., 25 Feb 2025).
- Graph learning and PDE-based solvers, interleaving local equivariant operators with global invariant pooling (Sannai et al., 2024).
- Self-supervised representation learning with VICReg- or BYOL-style losses, enforcing invariance/equivariance splits at the latent level (Garrido et al., 2023).
Open questions and avenues for future research include:
- Automated design of interleaving strategies (number and arrangement of invariant and equivariant blocks).
- Extension to higher group complex symmetries (space groups, point groups, non-Euclidean symmetry actions).
- Exploitation of data-driven group/lexicon discovery (NLP).
- Theoretical trade-offs between invariance "hardness", model expressivity, and sample complexity.
- Efficient implementation of high-order invariants, especially for large datasets or feature maps.
HIENet's modularity and mathematical guarantees make it a foundational approach for robust, interpretable, and efficient modeling in domains where explicit symmetry disentanglement is vital (Winter et al., 2022, Yan et al., 25 Feb 2025, Garrido et al., 2023, Sannai et al., 2024, White et al., 2022).