Papers
Topics
Authors
Recent
Search
2000 character limit reached

Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations

Published 11 Dec 2023 in cs.CV | (2312.06716v2)

Abstract: We present an approach for analyzing grouping information contained within a neural network's activations, permitting extraction of spatial layout and semantic segmentation from the behavior of large pre-trained vision models. Unlike prior work, our method conducts a holistic analysis of a network's activation state, leveraging features from all layers and obviating the need to guess which part of the model contains relevant information. Motivated by classic spectral clustering, we formulate this analysis in terms of an optimization objective involving a set of affinity matrices, each formed by comparing features within a different layer. Solving this optimization problem using gradient descent allows our technique to scale from single images to dataset-level analysis, including, in the latter, both intra- and inter-image relationships. Analyzing a pre-trained generative transformer provides insight into the computational strategy learned by such models. Equating affinity with key-query similarity across attention layers yields eigenvectors encoding scene spatial layout, whereas defining affinity by value vector similarity yields eigenvectors encoding object identity. This result suggests that key and query vectors coordinate attentional information flow according to spatial proximity (a where' pathway), while value vectors refine a semantic category representation (awhat' pathway).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Contour detection and hierarchical image segmentation. PAMI, 2011.
  2. Label-efficient semantic segmentation with diffusion models. In ICLR, 2021.
  3. High-for-low and low-for-high: Efficient boundary detection from deep object features and its applications to high-level vision. In ICCV, 2015.
  4. On the opportunities and risks of foundation models. arXiv:2108.07258, 2021.
  5. COCO-Stuff: Thing and stuff classes in context. In CVPR, 2018.
  6. John Canny. A computational approach to edge detection. PAMI, 1986.
  7. End-to-end object detection with transformers. In ECCV, 2020.
  8. Emerging properties in self-supervised vision transformers. In ICCV, 2021.
  9. Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers. In ICCV, 2021a.
  10. Transformer interpretability beyond attention visualization. In CVPR, 2021b.
  11. A simple framework for contrastive learning of visual representations. In ICML, 2020a.
  12. Improved baselines with momentum contrastive learning. arXiv:2003.04297, 2020b.
  13. Beyond surface statistics: Scene representations in a latent diffusion model. arXiv:2306.05720, 2023.
  14. PiCIE: Unsupervised semantic segmentation using invariance and equivariance in clustering. In CVPR, 2021.
  15. What does BERT look at? An analysis of BERT’s attention. arXiv:1906.04341, 2019.
  16. The Cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
  17. Spectral segmentation with multiscale graph decomposition. In CVPR, 2005.
  18. Fast edge detection using structured forests. PAMI, 2015.
  19. Foreground-background separation through concept distillation from generative image foundation models. In ICCV, 2023.
  20. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  21. The PASCAL visual object classes challenge: A retrospective. IJCV, 2015.
  22. Separate visual pathways for perception and action. Trends in Neurosciences, 1992.
  23. Generative adversarial nets. In NeurIPS, 2014.
  24. Unsupervised semantic segmentation by distilling feature correspondences. In ICLR, 2022.
  25. Mask R-CNN. In ICCV, 2017.
  26. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
  27. Masked autoencoders are scalable vision learners. In CVPR, 2022.
  28. Unsupervised semantic correspondence using stable diffusion. arXiv:2305.15581, 2023.
  29. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  30. Invariant information clustering for unsupervised image classification and segmentation. In ICCV, 2019.
  31. Scaling up GANs for text-to-image synthesis. In CVPR, 2023.
  32. A style-based generator architecture for generative adversarial networks. In CVPR, 2019.
  33. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
  34. Auto-encoding variational bayes. arXiv:1312.6114, 2013.
  35. Iasonas Kokkinos. Pushing the boundaries of boundary detection using deep learning. arXiv:1511.07386, 2015.
  36. Revealing the dark secrets of BERT. In EMNLP, 2019.
  37. A new neural framework for visuospatial processing. Nature Reviews Neuroscience, 2011.
  38. Language-driven semantic segmentation. In ICLR, 2022a.
  39. Adapting CLIP for phrase localization without further training. arXiv:2204.03647, 2022b.
  40. Microsoft COCO: Common objects in context. In ECCV, 2014.
  41. AttEntropy: Segmenting unknown objects in complex scenes using the spatial attention entropy of semantic segmentation transformers. arXiv:2212.14397, 2022.
  42. Object detection and segmentation from joint embedding of parts and pixels. In ICCV, 2011.
  43. Affinity CNN: Learning pixel-centric pairwise relations for figure/ground embedding. In CVPR, 2016.
  44. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
  45. Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization. In CVPR, 2022.
  46. Object vision and spatial vision: two cortical pathways. Trends in Neurosciences, 1983.
  47. Layer-wise relevance propagation: An overview. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, 2019.
  48. PyTorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
  49. Learning transferable visual models from natural language supervision. In ICML, 2021.
  50. Discriminatively trained sparse code gradients for contour detection. In NeurIPS, 2012.
  51. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  52. U-Net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
  53. Peter J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 1987.
  54. Gerald E. Schneider. Two visual systems: Brain mechanisms for localization and discrimination are dissociated by tectal and cortical lesions. Science, 1969.
  55. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In ICCV, 2017.
  56. Normalized cuts and image segmentation. PAMI, 2000.
  57. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
  58. Striving for simplicity: The all convolutional net. arXiv:1412.6806, 2014.
  59. Intriguing properties of neural networks. arXiv:1312.6199, 2013.
  60. Emergent correspondence from image diffusion. In NeurIPS, 2023.
  61. Normalized cut loss for weakly-supervised CNN segmentation. In CVPR, 2018.
  62. Colwyn B. Trevarthen. Two mechanisms of vision in primates. Psychologische Forschung, 1968.
  63. Neural discrete representation learning. In NeurIPS, 2017.
  64. Attention is all you need. In NeurIPS, 2017.
  65. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
  66. FreeSOLO: Learning to segment objects without annotations. In CVPR, 2022a.
  67. Cut and learn for unsupervised object detection and instance segmentation. In CVPR, 2023.
  68. TokenCut: Segmenting objects in images and videos with self-supervised transformer and normalized cut. arXiv:2209.00383, 2022b.
  69. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, 2018.
  70. GroupViT: Semantic segmentation emerges from text supervision. In CVPR, 2022.
  71. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015.
  72. Segmentation given partial grouping constraints. PAMI, 2004.
  73. Concurrent object recognition and segmentation by graph partitioning. In NeurIPS, 2002.
  74. Extract free dense labels from CLIP. In ECCV, 2022.
  75. Self-supervised learning of object parts for semantic segmentation. In CVPR, 2022.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.