HEAL-SWIN: A Vision Transformer On The Sphere (2307.07313v2)
Abstract: High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, enabling the network to process spherical representations with minimal computational overhead. We demonstrate the superior performance of our model on both synthetic and real automotive datasets, as well as a selection of other image datasets, for semantic segmentation, depth regression and classification tasks. Our code is publicly available at https://github.com/JanEGerken/HEAL-SWIN.
- Joint 2D-3D-Semantic Data for Indoor Scene Understanding. Arxiv e-prints arXiv:1702.01105, 2017.
- Parametric correspondence and chamfer matching: Two new techniques for image matching. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI), pages 659–670. MIT, 1977.
- Accurate medium-range global weather forecasting with 3D neural networks. Nature, 619(7970):533–538, 2023.
- Climformer – a spherical transformer model for long-term climate projections. In Proceedings of the Machine Learning and the Physical Sciences Workshop, NeurIPS 2022, 2022.
- Swin-Unet: Unet-like pure transformer for medical image segmentation. In Computer Vision – ECCV 2022 Workshops. ECCV 2022, pages 205–218. Springer International Publishing, 2022.
- Covariance in physics and convolutional neural networks. 2019.
- Spherical transformer. Arxiv e-prints arXiv:2202.04942, 2022.
- Efficient generalized spherical CNNs. In Proceedings of the International Conference on Learning Representations (ICLR), 2021.
- Gauge equivariant convolutional networks and the Icosahedral CNN. In Proceedings of the International Conference on Machine learning (ICML), pages 1321–1330. PMLR, 2019.
- Spherical CNNs. In Proceedings of the International Conference on Learning Representations (ICLR), 2018.
- DeepSphere: A graph-based spherical CNN. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
- Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
- CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, pages 1–16, 2017.
- An image is woth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), 2021.
- Computing Fourier transforms and convolutions on the 2-sphere. Advances in Applied Mathematics, 15:202–250, 1994.
- Spherical Transformer: Adapting spherical signal to CNNs. Arxiv e-prints arXiv:2101.03848, 2021.
- Convolutions on spherical images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 1–5. IEEE, 2019.
- Learning SO(3) Equivariant Representations with Spherical CNNs. In Proceedings of the European Conference on Computer Vision (ECCV), pages 52–68, 2018.
- Spin-weighted spherical CNNs. In Advances in Neural Information Processing Systems, pages 8614–8625. Curran Associates Inc., 2020.
- Embracing single stride 3d object detector with sparse transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8448–8458. IEEE, 2022.
- Equivariance versus augmentation for spherical images. In Proceedings of the International Conference on Machine Learning (ICML), pages 7404–7421. PMLR, 2022.
- Geometric deep learning and equivariant neural networks. Artificial Intelligence Review, 56:14605–14662, 2023.
- Analysis issues for large CMB data sets. Arxiv eprints arXiv:astro-ph/9812350, 1998.
- Adaptive Fourier neural operators: Efficient token mixers for transformers. In Proceedings of the International Conference on Learning Representations (ICLR), 2022.
- SWPT: Spherical window-based point cloud transformer. In Computer Vision – ACCV 2022. ACCV 2022, pages 396–412. Springer International Publishing, 2023.
- Surface Networks via general covers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 632–641. IEEE, 2019.
- Interpolated selectionconv for spherical images and surfaces. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 321–330. IEEE Computer Society, 2023.
- Gauge equivariant transformer. In Neural Information Processing Systems, pages 27331–27343. Curran Associates, Inc., 2021.
- Spherical CNNs on unstructured grids. In Proceedings of the International Conference of Learning Representations (ICLR), 2019.
- Clebsch– Gordan Nets: A Fully Fourier Space Spherical Convolutional Neural Network. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2018.
- Convolutional neural networks on the HEALPix sphere: A pixel-based algorithm and its application to CMB data analysis. A&A, 628:A129, 2019.
- FourCastNet: Accelerating Global High-Resolution Weather Forecasting Using Adaptive Fourier Neural Operators. In Proceedings of the Platform for Advanced Scientific Computing Conference, pages 1–11, Davos Switzerland, 2023. ACM.
- Stratified transformer for 3d point cloud segmentation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8490–8499. IEEE, 2022.
- Spherical transformer for LiDAR-based 3d recognition. Arxiv e-prints arXiv:2303.12766, 2023.
- GraphCast: Learning skillful medium-range global weather forecasting, 2022.
- SpherePHD: Applying CNNs on a spherical PolyHeDron representation of 360 degree images. Arxiv e-prints arXiv:1811.08196, 2019.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002. IEEE, 2021.
- Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), pages 11999–12009. IEEE, 2022.
- Voxel transformer for 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3144–3153. IEEE, 2021.
- ClimaX: A foundation model for weather and climate. In Proceedings of the 40th International Conference on Machine Learning, pages 25904–25938. PMLR, 2023.
- DeepSphere: Efficient spherical convolutional neural network with HEALPix sampling for cosmological applications. Astronomy and Computing, 27:130–146, 2019.
- Survey on fish-eye cameras and their applications in intelligent vehicles. IEEE Transactions on Intelligent Transportation Systems, 23:22755–22771, 2022.
- Woodscape Fisheye Semantic Segmentation for Autonomous Driving – CVPR 2021 OmniCV Workshop Challenge. 2021.
- The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3234–3243, 2016.
- SynWoodScape: Synthetic surround-view fisheye camera dataset for autonomous driving. IEEE Robotics and Automation Letters, 7:8502–8509, 2022.
- Equivariant networks for pixelized spheres. In Proceedings of the 38th International Conference on Machine Learning (ICML), pages 9477–9488. PMLR, 2021.
- Distortion-aware convolutional filters for dense prediction in panoramic images. In Computer Vision – ECCV 2018, pages 732–750. Springer International Publishing, 2018.
- Attention is all you need. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2017.
- Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9307–9317. IEEE, 2019.
- Orientation-Aware Semantic Segmentation on Icosahedron Spheres. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3533–3541, 2019.
- healpy: equal area pixelization and spherical harmonics transforms for data on the sphere in Python. Journal of Open Source Software, 4:1298, 2019.