Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (2306.03403v2)

Published 6 Jun 2023 in cs.CV, cs.AI, cs.LG, and cs.MM

Abstract: As an important and challenging problem in computer vision, PAnoramic Semantic Segmentation (PASS) gives complete scene perception based on an ultra-wide angle of view. Usually, prevalent PASS methods with 2D panoramic image input focus on solving image distortions but lack consideration of the 3D properties of original $360{\circ}$ data. Therefore, their performance will drop a lot when inputting panoramic images with the 3D disturbance. To be more robust to 3D disturbance, we propose our Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (SGAT4PASS), considering 3D spherical geometry knowledge. Specifically, a spherical geometry-aware framework is proposed for PASS. It includes three modules, i.e., spherical geometry-aware image projection, spherical deformable patch embedding, and a panorama-aware loss, which takes input images with 3D disturbance into account, adds a spherical geometry-aware constraint on the existing deformable patch embedding, and indicates the pixel density of original $360{\circ}$ data, respectively. Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude. Our code and supplementary material are available at https://github.com/TencentARC/SGAT4PASS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Deep learning for omnidirectional vision: A survey and new perspectives. arXiv preprint arXiv:2205.10468, 2022.
  2. Joint 2d-3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105, 2017.
  3. Fredsnet: Joint monocular depth and semantic segmentation with fast fourier convolutions. arXiv preprint arXiv:2210.01595, 2022.
  4. Ranking consistency for image matching and object retrieval. Pattern Recognition, 47(3):1349–1360, 2014.
  5. Dpt: Deformable patch-based transformer for visual recognition. In Proc. ACM MM, 2021.
  6. Deformable convolutional networks. In Proc. ICCV, pages 764–773, 2017.
  7. Eliminating the blind spot: Adapting 3d object detection and monocular depth estimation to 360 panoramic imagery. In Proc. ECCV, pages 789–807, 2018.
  8. Cnn based semantic segmentation for urban traffic scenes using fisheye camera. In Proc. IEEE Intell. Vehicles Symp., pages 231–236. IEEE, 2017.
  9. Tangent images for mitigating spherical distortion. In Proc. CVPR, 2020.
  10. Spin-weighted spherical cnns. Proc. NeurIPS, 33:8614–8625, 2020.
  11. Review on panoramic imaging and its applications in scene understanding. arXiv preprint arXiv:2205.05570, 2022.
  12. Distortion convolution module for semantic segmentation of panoramic images based on the image-forming principle. IEEE Trans. Instrum. Meas., 71:1–12, 2022.
  13. Spherical cnns on unstructured grids. arXiv preprint arXiv:1901.02039, 2019.
  14. Learning multi-level density maps for crowd counting. IEEE transactions on neural networks and learning systems, 31(8):2705–2715, 2019.
  15. Adam: A method for stochastic optimization. In Proc. ICLR, 2015.
  16. Spherephd: Applying cnns on a spherical polyhedron representation of 360 degree images. arXiv preprint arXiv:1811.08196, 2018.
  17. Graph mode-based contextual kernels for robust svm tracking. In 2011 international conference on computer vision, pages 1156–1163. IEEE, 2011.
  18. Densepass: Dense panoramic semantic segmentation via unsupervised domain adaptation with attention-augmented context exchange. In Proc. ITSC, pages 2766–2772. IEEE, 2021.
  19. Slicenet: deep dense depth estimation from a single indoor panorama using a slice-based representation. In Proc. CVPR, pages 11536–11545, 2021.
  20. Optical testing of panoramic lenses. Opt. Eng., 51(5):053603, 2012.
  21. Dynamicvit: Efficient vision transformers with dynamic token sparsification. Proc. NeurIPS, 34:13937–13949, 2021.
  22. Panoformer: Panorama transformer for indoor 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT depth estimation. In Proc. ECCV, pages 195–211. Springer, 2022.
  23. Recent advances and trends in multimodal deep learning: A review. arXiv preprint arXiv:2105.11087, 2021.
  24. Hohonet: 360 indoor holistic understanding with latent horizontal features. In Proc. CVPR, pages 2573–2582, 2021.
  25. Distortion-aware convolutional filters for dense prediction in panoramic images. In Proc. ECCV, pages 707–722, 2018.
  26. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proc. ICCV, pages 568–578, 2021.
  27. Not all images are worth 16x16 words: Dynamic transformers for efficient image recognition. Proc. NeurIPS, 34:11960–11973, 2021.
  28. Vision transformer with deformable attention. In Proc. CVPR, pages 4794–4803, 2022.
  29. Predicting head movement in panoramic video: A deep reinforcement learning approach. IEEE Trans. Pattern Anal. Mach. Intell., 41(11):2693–2708, 2018.
  30. Spherical dnns and their applications in 360 images and videos. IEEE Trans. Pattern Anal. Mach. Intell., 2021.
  31. Evo-vit: Slow-fast token evolution for dynamic vision transformer. In Proc. AAAI, volume 36, pages 2964–2972, 2022.
  32. Pass: Panoramic annular semantic segmentation. IEEE Trans. Intell. Trans. Syst., 21(10):4171–4185, 2019.
  33. Ds-pass: Detail-sensitive panoramic annular semantic segmentation through swaftnet for surrounding sensing. In Proc. IEEE Intell. Vehicles Symp., pages 457–464. IEEE, 2020.
  34. Is context-aware cnn ready for the surroundings? panoramic semantic segmentation in the wild. IEEE Trans. Image Process., 30:1866–1881, 2021.
  35. Capturing omni-range context for omnidirectional segmentation. In Proc. CVPR, pages 1376–1386, 2021.
  36. Ghost panorama using a convex mirror. Opt. Lett., 46(21):5389–5392, 2021.
  37. A-vit: Adaptive tokens for efficient vision transformer. In Proc. CVPR, pages 10809–10818, 2022.
  38. Vision transformer with progressive sampling. In Proc. ICCV, pages 387–396, 2021.
  39. Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In Proc. CVPR, pages 16917–16927, 2022.
  40. Behind every domain there is a shift: Adapting distortion-aware vision transformers for panoramic semantic segmentation. arXiv preprint arXiv:2207.11860, 2022.
  41. Complementary bi-directional feature compression for indoor 360deg semantic segmentation with self-distillation. In Proc. WACV, pages 4501–4510, 2023.
  42. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020.
  43. Acdnet: Adaptively combined dilated convolution for monocular panorama depth estimation. In Proc. AAAI, volume 36, pages 3653–3661, 2022.
Citations (8)

Summary

We haven't generated a summary for this paper yet.