Group Equivariant GSA
- Group Equivariant GSA is a neural network model that combines sparse autoencoding with explicit equivariance, ensuring its latent features transform consistently under group actions.
- It employs dual losses—invariance to ensure consistent reconstruction across group transformations and equivariance loss to adaptively fit transformation matrices.
- Empirical evaluations show that these models improve mechanistic interpretability and performance on symmetry-based tasks, with high R² and elevated F1 scores in structured probing.
Group Equivariant Group Sparse Autoencoders (Group Equivariant GSA) are a class of neural network models that combine sparse autoencoding with explicit equivariance to group actions, allowing the discovery and utilization of symmetry-aware latent features. These models extend the sparse autoencoder architecture by enforcing structured constraints reflecting the symmetries present in the data, with the result that learned features exhibit predictable transformations under the action of a specified symmetry group, such as rotations. This approach enhances downstream task performance and mechanistic interpretability, particularly for scientific and structured domains where group symmetries are intrinsic (Erdogan et al., 12 Nov 2025).
1. Group Equivariance and Sparse Autoencoding
Group equivariance refers to the property that a network layer or mapping commutes with the action of a group . Formally, for a function , group action , and representation , equivariance requires . In the context of sparse autoencoders, this principle is incorporated by requiring the encoder and decoder of the autoencoder to interact with a family of linear maps (one per group element ) such that the encoded features and reconstructive output transform consistently with 's action. Specifically, for data , autoencoder activations , and group element , the equivariance constraints are: and
where and denote encoder and decoder, respectively (Erdogan et al., 12 Nov 2025).
This structured constraint ensures that the learned feature activations correspond to physically or semantically meaningful entities, which follow the same symmetry as the data and underlying domain.
2. Mathematical Formulation and Training Objectives
Group Equivariant GSA models incorporate both an invariance-driven autoencoding loss and a transformation-fitting loss:
- Invariance Loss: Enforces that all group-augmented versions of an input produce decodings matching the canonical features ,
- Equivariance Loss: Fits the set of linear maps to capture how the base model’s representations transform under group actions,
Sparsity in the encoder is enforced via a hard TopK operator, selecting the largest activations in the latent vector. These losses are jointly optimized, yielding a family of sparse features whose transformations under are explicitly parameterized (Erdogan et al., 12 Nov 2025).
3. Architecture and Implementation Details
A representative Group Equivariant GSA employs the following components (as described for cyclic group rotations ):
- Base model : Either an MLP or CNN autoencoder maps to pre-activations.
- Encoder : A two-layer MLP, culminating in a TopK-sparsified -dimensional vector ().
- Decoder : A single linear layer reconstructing the feature space.
- Group action maps : Each is initialized as identity and trained to map base activations according to the group structure.
The encoder-decoder parameters are updated via Adam optimization to minimize , while is periodically updated to minimize . No explicit regularization is placed on to enforce group representation structure, reflecting a post-hoc, adaptive fit to the observed equivariance in the underlying network (Erdogan et al., 12 Nov 2025).
4. Adaptive Equivariance and Feature Analysis
The adaptive character of the approach lies in the flexibility of : they are fit to best explain the behavior of the (possibly non-equivariant) base model activations under group transformation. This mechanism reveals the degree of approximate symmetry already present, and can distinguish invariant from truly equivariant (transforming) directions in the latent space. A quantitative result for a model trained on rotated synthetic images shows that the learned explain of the variance in for both MLP and CNN activations, substantially outperforming the naive baseline (Erdogan et al., 12 Nov 2025).
Downstream, this enables partitioning decoder atoms into strictly invariant and equivariant features, aiding mechanistic interpretability and producing representations attuned to structured probing tasks (e.g., those querying the presence, position, or orientation of an object).
5. Empirical Performance and Probing Results
Empirical evaluation on synthetic image datasets with known symmetry structure demonstrates that Group Equivariant GSA models yield latent representations and reconstructions that are more informative for group-structured probing tasks. In quantitative probing over four grouped binary classification frameworks (detecting shape , shape-position , shape-orientation , and shape-position-orientation ), Equivariant SAEs (Equi-SAE) achieve higher or equal mean F1 scores on both latent and decoder-reconstruction probes compared to regular and wide SAEs:
| Model | S | SP | SO | SPO |
|---|---|---|---|---|
| Reg-SAE (recon) | 0.93 | 0.83 | 0.70 | 0.30 |
| Wide-SAE (recon) | 0.94 | 0.83 | 0.65 | 0.22 |
| Equi-SAE (recon) | 0.96 | 0.89 | 0.70 | 0.35 |
This suggests that incorporating equivariance explicitly in the autoencoder yields more structure-disentangled and semantically meaningful features, particularly for tasks aligned with the symmetry group (Erdogan et al., 12 Nov 2025).
6. Connections to Broader Group-Equivariant Architectures
The Group Equivariant GSA principle is conceptually aligned with other approaches enforcing equivariance in neural networks, such as group equivariant convolutions (Lengyel et al., 2021), equivariant self-attention mechanisms (Romero et al., 2020), and general frameworks for equivariant neural networks on reductive Lie groups (Batatia et al., 2023). However, the focus in Group Equivariant GSA is on interpretability and feature disentanglement, as opposed to predictive modeling alone.
Distinctively, the adaptive mechanism for learning representation matrices enables Group Equivariant GSA to work post-hoc on arbitrarily pretrained neural activations, providing a tool for mechanistic analysis even when the original network is not strictly equivariant.
7. Limitations and Future Directions
Group Equivariant GSA models, as presented, learn a set of linear operators individually per group element, without explicit enforcement of the group structure (such as or for cyclic groups). This suggests some inductive bias could be further injected, possibly through parameter tying or regularization. Additionally, the current models employ hard TopK sparsity; alternative sparsity-inducing methods could offer finer control. Scaling to continuous or higher-dimensional groups would require extending the parameterization of and possibly integrating more advanced representation-theoretic tools. A plausible implication is that such extensions could bridge the methodology with group-equivariant Lie group neural architectures (Batatia et al., 2023).
Group Equivariant GSA provides an effective and interpretable approach for leveraging group symmetries in feature learning and mechanistic probing, particularly for data domains with explicit group actions. Its effectiveness has been empirically established in synthetic structured data and forms an adaptable foundation for further developments in group-aware unsupervised learning (Erdogan et al., 12 Nov 2025).