Steerable Transformers for Volumetric Data (2405.15932v3)
Abstract: We introduce Steerable Transformers, an extension of the Vision Transformer mechanism that maintains equivariance to the special Euclidean group $\mathrm{SE}(d)$. We propose an equivariant attention mechanism that operates on features extracted by steerable convolutions. Operating in Fourier space, our network utilizes Fourier space non-linearities. Our experiments in both two and three dimensions show that adding steerable transformer layers to steerable convolutional networks enhances performance.
- Cormorant: Covariant molecular neural networks. Advances in neural information processing systems, 32, 2019.
- Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473, 2014. URL https://api.semanticscholar.org/CorpusID:11212020.
- Generative and discriminative voxel modeling with convolutional neural networks. ArXiv, abs/1608.04236, 2016. URL https://api.semanticscholar.org/CorpusID:2271164.
- W. E. Byerly. An elemenatary treatise on Fourier’s series, and spherical, cylindrical, and ellipsoidal harmonics, with applications to problems in mathematical physics. Dover Publicatiions, 1893.
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
- Se (3)-equivariant attention networks for shape reconstruction in function space. arXiv preprint arXiv:2204.02394, 2022.
- N. Chen and S. Villar. Se (3)-equivariant self-attention via invariant features. In Machine Learning for Physics NeurIPS Workshop, 2022.
- T. Cohen and M. Welling. Group equivariant convolutional networks. In International conference on machine learning, pages 2990–2999. PMLR, 2016.
- T. S. Cohen and M. Welling. Steerable CNNs. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=rJQKYt5ll.
- Spherical CNNs. In International Conference on Learning Representations, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929, 2020. URL https://api.semanticscholar.org/CorpusID:225039882.
- V. P. Dwivedi and X. Bresson. A generalization of transformer networks to graphs. arXiv preprint arXiv:2012.09699, 2020.
- Se (3)-transformers: 3d roto-translation equivariant attention networks. Advances in neural information processing systems, 33:1970–1981, 2020.
- V. Hegde and R. Zadeh. Fusionnet: 3d object classification using multiple data representations. arXiv preprint arXiv:1607.05695, 2016.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014. URL https://api.semanticscholar.org/CorpusID:6628106.
- Clebsch–gordan nets: a fully fourier space spherical convolutional neural network. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 10117–10126. Curran Associates, Inc., 2018.
- Y. LeCun and C. Cortes. The mnist database of handwritten digits. 2005. URL https://api.semanticscholar.org/CorpusID:60282629.
- So-net: Self-organizing network for point cloud analysis. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9397–9406, 2018. doi: 10.1109/CVPR.2018.00979.
- Discrete rotation equivariance for point cloud recognition. In 2019 International conference on robotics and automation (ICRA), pages 7269–7275. IEEE, 2019.
- Rotation equivariant vector field networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 5048–5057, 2017.
- D. Maturana and S. Scherer. Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 922–928, 2015. doi: 10.1109/IROS.2015.7353481.
- J. Munkres. Topology; a First Course. Prentice-Hall, 1974. ISBN 9780139254956. URL https://books.google.com/books?id=LtEPAQAAMAAJ.
- F. Peter and H. Weyl. Die vollständigkeit der primitiven darstellungen einer geschlossenen kontinuierlichen gruppe. Mathematische Annalen, 97(1):737–755, 1927.
- Stand-alone self-attention in vision models. Advances in neural information processing systems, 32, 2019.
- A. Robert. Introduction to the representation theory of compact and locally compact groups, volume 80. Cambridge University Press, 1983.
- Attentive group equivariant convolutional networks. In International Conference on Machine Learning, pages 8188–8199. PMLR, 2020.
- Group equivariant stand-alone self-attention for vision. ArXiv, abs/2010.00977, 2020. URL https://api.semanticscholar.org/CorpusID:222125298.
- Orientation-boosted voxel nets for 3d object recognition. arXiv preprint arXiv:1604.03351, 2016.
- Self-attention with relative position representations. In North American Chapter of the Association for Computational Linguistics, 2018. URL https://api.semanticscholar.org/CorpusID:3725815.
- M. Simonovsky and N. Komodakis. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 29–38, 2017. doi: 10.1109/CVPR.2017.11.
- The general theory of permutation equivarant neural networks and higher order graph variational encoders. arXiv preprint arXiv:2004.03990, 2020.
- Tensor field networks: Rotation- and translation-equivariant neural networks for 3d point clouds. ArXiv, abs/1802.08219, 2018. URL https://api.semanticscholar.org/CorpusID:3457605.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Scalars are universal: Equivariant machine learning, structured like classical physics. Advances in Neural Information Processing Systems, 34:28848–28863, 2021.
- Equivariant networks for hierarchical structures. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 13806–13817. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/9efb1a59d7b58e69996cf0e32cb71098-Paper.pdf.
- M. Weiler and G. Cesa. General e (2)-equivariant steerable cnns. Advances in neural information processing systems, 32, 2019.
- 3d steerable cnns: Learning rotationally equivariant features in volumetric data. Advances in Neural Information Processing Systems, 31, 2018.
- E. Wigner. Group theory: and its application to the quantum mechanics of atomic spectra, volume 5. Elsevier, 2012.
- Cubenet: Equivariance to 3d rotation and translation. In European Conference on Computer Vision, 2018. URL https://api.semanticscholar.org/CorpusID:4795882.
- Harmonic networks: Deep translation and rotation equivariance. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5028–5037, 2017.
- 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015.
- e(2)𝑒2e(2)italic_e ( 2 )-equivariant vision transformer. In Uncertainty in Artificial Intelligence, pages 2356–2366. PMLR, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.