Group Crosscoders for Mechanistic Analysis of Symmetry (2410.24184v2)
Abstract: We introduce group crosscoders, an extension of crosscoders that systematically discover and analyse symmetrical features in neural networks. While neural networks often develop equivariant representations without explicit architectural constraints, understanding these emergent symmetries has traditionally relied on manual analysis. Group crosscoders automate this process by performing dictionary learning across transformed versions of inputs under a symmetry group. Applied to InceptionV1's mixed3b layer using the dihedral group $\mathrm{D}_{32}$, our method reveals several key insights: First, it naturally clusters features into interpretable families that correspond to previously hypothesised feature types, providing more precise separation than standard sparse autoencoders. Second, our transform block analysis enables the automatic characterisation of feature symmetries, revealing how different geometric features (such as curves versus lines) exhibit distinct patterns of invariance and equivariance. These results demonstrate that group crosscoders can provide systematic insights into how neural networks represent symmetry, offering a promising new tool for mechanistic interpretability.
- Linear algebraic structure of word senses, with applications to polysemy. Transactions of the Association for Computational Linguistics, 6:483–495, 2018.
- The statistical inefficiency of sparse coding for images (or, one gabor to rule them all). arXiv preprint arXiv:1109.6638, 2011.
- Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread, 2023. https://transformer-circuits.pub/2023/monosemantic-features/index.html.
- Curve detectors. Distill, 2020. doi: 10.23915/distill.00024.003. https://distill.pub/2020/circuits/curve-detectors.
- Group equivariant convolutional networks. In International conference on machine learning, pp. 2990–2999. PMLR, 2016.
- Sparse autoencoders find highly interpretable features in language models, 2023.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Exploiting cyclic symmetry in convolutional neural networks. In International conference on machine learning, pp. 1889–1898. PMLR, 2016.
- Toy models of superposition. Transformer Circuits Thread, 2022.
- URL https://x.com/livgorton/status/1818818574443847847.
- URL https://livgorton.com/inceptionv1-mixed5b-sparse-autoencoders.
- URL https://github.com/liv0617/lucent.
- Gorton, L. The missing curve detectors of inceptionv1: Applying sparse autoencoders to inceptionv1 early vision, 2024d. URL https://arxiv.org/abs/2406.03662.
- Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
- Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 991–999, 2015.
- Sparse crosscoders for cross-layer features and model diffing. 2024.
- Feature visualization. Distill, 2017. doi: 10.23915/distill.00007. https://distill.pub/2017/feature-visualization.
- An overview of early vision in inceptionv1. Distill, 2020a. doi: 10.23915/distill.00024.002. https://distill.pub/2020/circuits/early-vision.
- Naturally occurring equivariance in neural networks. Distill, 2020b. doi: 10.23915/distill.00024.004. https://distill.pub/2020/circuits/equivariance.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
- High-low frequency detectors. Distill, 2021. doi: 10.23915/distill.00024.005. https://distill.pub/2020/circuits/frequency-edges.
- Swee Kiat, L. Lucent, 2021. URL https://github.com/greentfrapp/lucent.
- Going deeper with convolutions. CoRR, abs/1409.4842, 2014. URL http://arxiv.org/abs/1409.4842.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.