Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 69 tok/s
Gemini 2.5 Pro 39 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 209 tok/s Pro
GPT OSS 120B 457 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Group Crosscoders for Mechanistic Analysis of Symmetry (2410.24184v2)

Published 31 Oct 2024 in cs.LG

Abstract: We introduce group crosscoders, an extension of crosscoders that systematically discover and analyse symmetrical features in neural networks. While neural networks often develop equivariant representations without explicit architectural constraints, understanding these emergent symmetries has traditionally relied on manual analysis. Group crosscoders automate this process by performing dictionary learning across transformed versions of inputs under a symmetry group. Applied to InceptionV1's mixed3b layer using the dihedral group $\mathrm{D}_{32}$, our method reveals several key insights: First, it naturally clusters features into interpretable families that correspond to previously hypothesised feature types, providing more precise separation than standard sparse autoencoders. Second, our transform block analysis enables the automatic characterisation of feature symmetries, revealing how different geometric features (such as curves versus lines) exhibit distinct patterns of invariance and equivariance. These results demonstrate that group crosscoders can provide systematic insights into how neural networks represent symmetry, offering a promising new tool for mechanistic interpretability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Linear algebraic structure of word senses, with applications to polysemy. Transactions of the Association for Computational Linguistics, 6:483–495, 2018.
  2. The statistical inefficiency of sparse coding for images (or, one gabor to rule them all). arXiv preprint arXiv:1109.6638, 2011.
  3. Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread, 2023. https://transformer-circuits.pub/2023/monosemantic-features/index.html.
  4. Curve detectors. Distill, 2020. doi: 10.23915/distill.00024.003. https://distill.pub/2020/circuits/curve-detectors.
  5. Group equivariant convolutional networks. In International conference on machine learning, pp.  2990–2999. PMLR, 2016.
  6. Sparse autoencoders find highly interpretable features in language models, 2023.
  7. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  8. Exploiting cyclic symmetry in convolutional neural networks. In International conference on machine learning, pp.  1889–1898. PMLR, 2016.
  9. Toy models of superposition. Transformer Circuits Thread, 2022.
  10. URL https://x.com/livgorton/status/1818818574443847847.
  11. URL https://livgorton.com/inceptionv1-mixed5b-sparse-autoencoders.
  12. URL https://github.com/liv0617/lucent.
  13. Gorton, L. The missing curve detectors of inceptionv1: Applying sparse autoencoders to inceptionv1 early vision, 2024d. URL https://arxiv.org/abs/2406.03662.
  14. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
  15. Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  991–999, 2015.
  16. Sparse crosscoders for cross-layer features and model diffing. 2024.
  17. Feature visualization. Distill, 2017. doi: 10.23915/distill.00007. https://distill.pub/2017/feature-visualization.
  18. An overview of early vision in inceptionv1. Distill, 2020a. doi: 10.23915/distill.00024.002. https://distill.pub/2020/circuits/early-vision.
  19. Naturally occurring equivariance in neural networks. Distill, 2020b. doi: 10.23915/distill.00024.004. https://distill.pub/2020/circuits/equivariance.
  20. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
  21. High-low frequency detectors. Distill, 2021. doi: 10.23915/distill.00024.005. https://distill.pub/2020/circuits/frequency-edges.
  22. Swee Kiat, L. Lucent, 2021. URL https://github.com/greentfrapp/lucent.
  23. Going deeper with convolutions. CoRR, abs/1409.4842, 2014. URL http://arxiv.org/abs/1409.4842.

Summary

  • The paper’s main contribution is introducing group crosscoders to automatically extract and cluster symmetrical features in neural networks.
  • The methodology uses dictionary learning on activation vectors from transformed inputs and cosine similarity to reveal distinct feature clusters.
  • Experimental results on InceptionV1’s mixed3b layer clearly separate curvilinear and angular features, enhancing network interpretability.

Analysis of "Group Crosscoders for Mechanistic Analysis of Symmetry"

The paper "Group Crosscoders for Mechanistic Analysis of Symmetry" by Liv Gorton contributes to the understanding of symmetry in neural networks by introducing the concept of group crosscoders. This novel approach extends the traditional crosscoders, which were initially designed to find analogous features across neural network layers, enabling them to systematically explore and analyse the symmetry within neural networks.

Key Contributions

The core contribution of the paper is the introduction of group crosscoders. These are applied to neural networks to identify and cluster symmetrical features, providing a new layer of interpretability into how neural networks inherently develop certain symmetry properties even when these are not explicitly enforced. The approach is demonstrated on the InceptionV1 architecture's mixed3b layer using the dihedral group D32\mathrm{D}_{32}, a structured group with operations including rotations and reflections.

The novelty of group crosscoders lies in their ability to automatically extract and analyze equivariant features, without the need for predefined architectural constraints. By incorporating dictionary learning across transformed input versions under a symmetry group, these crosscoders not only reveal inherent symmetry but also cluster features into interpretable families. This clustering facilitates a more granular understanding of feature types, surpassing the resolution provided by standard sparse autoencoders.

Methodology

Group crosscoders build on the existing framework of crosscoders by learning from activation patterns of transformed inputs under group actions, instead of relying on multilayer or multimodel parallels. The paper conducts rigorous dictionary learning on vectors containing activations from multiple transformations of the same input image, a departure from the conventional vector representations in this context.

The group crosscoder's ability to predict transformed vector activations from untransformed inputs introduces a methodological innovation that aids interpretability. The training process involves constructing datasets from the ImageNet and adopting cosine similarity as a metric to measure feature symmetry through a distance matrix. A UMAP (Uniform Manifold Approximation and Projection) visualization was employed to represent the clustering of feature families effectively.

Experimental Results

The results demonstrate that group crosscoders can discern and classify features into distinct clusters corresponding to previously hypothesized feature families. The experiment with InceptionV1's mixed3b layer shows clear separation and interpretability across these clusters, distinguishing between, for example, curvilinear and angular features in neural network representations. The features’ symmetry analysis reveals how different geometric traits exhibit distinct transformation patterns—such as curves requiring a full 360\textdegree rotation for equivariance compared to lines’ 180\textdegree rotational symmetry.

Implications and Future Directions

The introduction of group crosscoders marks an advancement in mechanistic interpretability approaches. By automating the discovery and clustering of symmetrical features, this method reduces the ambiguity in understanding neural network feature representation, especially concerning innate equivariant properties. This advancement adds depth to ongoing research in vision interpretability by providing a structured approach to symmetry analysis, potentially applicable across modalities beyond image processing or for groups with more complex symmetries.

Future research can explore extending this methodology to groups beyond the dihedral, perhaps incorporating scaling transformations or colour transformations, thereby broadening the applicability and robustness of group crosscoders. Additionally, cross-examination with other architectures could compare the prevalence and nature of symmetrical feature representation across models, potentially influencing design principles in neural network development.

In conclusion, the development of group crosscoders offers an insightful tool for AI researchers looking to explore and quantify neural network symmetries. The methodology sets a precedent for future explorations in mechanistic interpretability, offering potential pathways for rich academic inquiry.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 posts and received 182 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube