Rosetta Neurons: Mining the Common Units in a Model Zoo
Abstract: Do different neural networks, trained for various vision tasks, share some common representations? In this paper, we demonstrate the existence of common features we call "Rosetta Neurons" across a range of models with different architectures, different tasks (generative and discriminative), and different types of supervision (class-supervised, text-supervised, self-supervised). We present an algorithm for mining a dictionary of Rosetta Neurons across several popular vision models: Class Supervised-ResNet50, DINO-ResNet50, DINO-ViT, MAE, CLIP-ResNet50, BigGAN, StyleGAN-2, StyleGAN-XL. Our findings suggest that certain visual concepts and structures are inherently embedded in the natural world and can be learned by different models regardless of the specific task or architecture, and without the use of semantic labels. We can visualize shared concepts directly due to generative models included in our analysis. The Rosetta Neurons facilitate model-to-model translation enabling various inversion-based manipulations, including cross-class alignments, shifting, zooming, and more, without the need for specialized training.
- Network dissection: Quantifying interpretability of deep visual representations. In Computer Vision and Pattern Recognition, 2017.
- Gan dissection: Visualizing and understanding generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
- Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019.
- Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
- The representation of biological classes in the human brain. The Journal of Neuroscience, 32:2608 – 2618, 2012.
- Best-buddies similarity for roboust template matching. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages 2021–2029, 2015.
- Shimon Edelman. Representation is representation of similarities. Behavioral and Brain Sciences, 21(4):449–467, 1998.
- Blobgan: Spatially disentangled scene representations. European Conference on Computer Vision (ECCV), 2022.
- A disentangling invertible interpretation network for explaining latent representations. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9220–9229, 2020.
- Ganalyze: Toward visual definitions of cognitive image properties. arXiv preprint arXiv:1906.10112, 2019.
- Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science (New York, N.Y.), 293:2425–30, 10 2001.
- Masked autoencoders are scalable vision learners. arXiv:2111.06377, 2021.
- Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2015.
- Image-to-image translation with conditional adversarial networks. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, 2017.
- Analyzing and improving the image quality of StyleGAN. In Proc. CVPR, 2020.
- Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Similarity of neural network representations revisited. ArXiv, abs/1905.00414, 2019.
- Representational similarity analysis - connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2, 2008.
- Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60:1126–1141, 2008.
- Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60:84 – 90, 2012.
- Explaining in style: Training a gan to explain a classifier in stylespace. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 673–682, 2021.
- Zoom in: An introduction to circuits. Distill, 2020. https://distill.pub/2020/circuits/zoom-in.
- Rise: Randomized input sampling for explanation of black-box models. In Proceedings of the British Machine Vision Conference (BMVC), 2018.
- Learning transferable visual models from natural language supervision. CoRR, abs/2103.00020, 2021.
- There and back again: Revisiting backpropagation saliency methods. CoRR, abs/2004.02866, 2020.
- Pivotal tuning for latent-based editing of real images. ACM Trans. Graph., 2021.
- High-resolution image synthesis with latent diffusion models, 2021.
- Network-to-network translation with conditional invertible neural networks. arXiv: Computer Vision and Pattern Recognition, 2020.
- Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM SIGGRAPH 2022 Conference Proceedings, SIGGRAPH ’22, New York, NY, USA, 2022. Association for Computing Machinery.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128:336–359, 2016.
- Semantic pyramid for image generation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7455–7464, 2020.
- Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, abs/1312.6034, 2013.
- Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014.
- Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 111, 05 2014.
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- Visualizing and understanding convolutional networks. In European Conference on Computer Vision, 2013.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.