Decoupling Semantic Similarity from Spatial Alignment for Neural Networks (2410.23107v1)
Abstract: What representation do deep neural networks learn? How similar are images to each other for neural networks? Despite the overwhelming success of deep learning methods key questions about their internal workings still remain largely unanswered, due to their internal high dimensionality and complexity. To address this, one approach is to measure the similarity of activation responses to various inputs. Representational Similarity Matrices (RSMs) distill this similarity into scalar values for each input pair. These matrices encapsulate the entire similarity structure of a system, indicating which input leads to similar responses. While the similarity between images is ambiguous, we argue that the spatial location of semantic objects does neither influence human perception nor deep learning classifiers. Thus this should be reflected in the definition of similarity between image responses for computer vision systems. Revisiting the established similarity calculations for RSMs we expose their sensitivity to spatial alignment. In this paper, we propose to solve this through semantic RSMs, which are invariant to spatial permutation. We measure semantic similarity between input responses by formulating it as a set-matching problem. Further, we quantify the superiority of semantic RSMs over spatio-semantic RSMs through image retrieval and by comparing the similarity between representations to the similarity between predicted class probabilities.
- R. E. Burkard. Quadratic assignment problems. European Journal of Operational Research, 15(3):283–289, 1984.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Grounding representation similarity through statistical testing. Advances in Neural Information Processing Systems, 34:1556–1568, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Kernel methods for measuring independence. Journal of Machine Learning Research, 6(12), 2005.
- S. Guthe and D. Thuerck. Algorithm: A fast scalable solver for the dense linear (sum) assignment problem. ACM Trans. Math. Softw., 47(2), apr 2021. ISSN 0098-3500. doi: 10.1145/3442348. URL https://doi.org/10.1145/3442348.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
- R. Jonker and T. Volgenant. A shortest augmenting path algorithm for dense and sparse linear assignment problems. In DGOR/NSOR: Papers of the 16th Annual Meeting of DGOR in Cooperation with NSOR/Vorträge der 16. Jahrestagung der DGOR zusammen mit der NSOR, pages 622–622. Springer, 1988.
- Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023.
- Similarity of neural network models: A survey of functional and representational measures. arXiv preprint arXiv:2305.06329, 2023.
- Big transfer (bit): General visual representation learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, pages 491–507. Springer, 2020.
- Similarity of neural network representations revisited. In 36th International Conference on Machine Learning, ICML 2019, volume 2019-June, pages 6156–6175, 2019. ISBN 9781510886988. URL https://arxiv.org/abs/1905.00414.
- Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in systems neuroscience, page 4, 2008.
- A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
- Convergent learning: Do different neural networks learn the same representations? Technical report, 2015.
- T. Lüddecke and A. Ecker. Image segmentation using text and image prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7086–7096, June 2022.
- S. Marcel and Y. Rodriguez. Torchvision the machine-vision package of torch. In Proceedings of the 18th ACM international conference on Multimedia, pages 1485–1488, 2010.
- S. Martello and P. Toth. Linear assignment problems. In North-Holland Mathematics Studies, volume 132, pages 259–282. Elsevier, 1987.
- Insights on representational similarity in neural networks with canonical correlation. Technical report, 2018. URL https://arxiv.org/abs/1806.05759.
- What is being transferred in transfer learning? Advances in neural information processing systems, 33:512–523, 2020.
- Do wide and deep networks learn the same things? uncovering how neural network representations vary with width and depth. arXiv preprint arXiv:2010.15327, 2020.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- L. Perron and V. Furnon. Or-tools. URL https://developers.google.com/optimization/.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. Advances in Neural Information Processing Systems, 2017-Decem:6077–6086, jun 2017. URL http://arxiv.org/abs/1706.05806.
- Do vision transformers see like convolutional neural networks? Advances in neural information processing systems, 34:12116–12128, 2021.
- Anatomy of catastrophic forgetting: Hidden representations and task semantics. arXiv preprint arXiv:2007.07400, 2020.
- Feature selection via dependence maximization. Journal of Machine Learning Research, 13:1393–1434, 2012. ISSN 15324435.
- Getting aligned on representational alignment. arXiv preprint arXiv:2310.13018, 2023.
- SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2.
- Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation. Technical report, 2018. URL https://arxiv.org/abs/1810.11750.
- Generalized shape metrics on neural representations. Advances in Neural Information Processing Systems, 34:4738–4750, 2021.
- Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16133–16142, 2023.
- Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579, 2015.
- Egoobjects: A large-scale egocentric dataset for fine-grained object understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20110–20120, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.