Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation (2403.16370v1)

Published 25 Mar 2024 in cs.CV

Abstract: This paper tackles a novel yet challenging problem: how to transfer knowledge from the emerging Segment Anything Model (SAM) -- which reveals impressive zero-shot instance segmentation capacity -- to learn a compact panoramic semantic segmentation model, i.e., student, without requiring any labeled data. This poses considerable challenges due to SAM's inability to provide semantic labels and the large capacity gap between SAM and the student. To this end, we propose a novel framework, called GoodSAM, that introduces a teacher assistant (TA) to provide semantic information, integrated with SAM to generate ensemble logits to achieve knowledge transfer. Specifically, we propose a Distortion-Aware Rectification (DAR) module that first addresses the distortion problem of panoramic images by imposing prediction-level consistency and boundary enhancement. This subtly enhances TA's prediction capacity on panoramic images. DAR then incorporates a cross-task complementary fusion block to adaptively merge the predictions of SAM and TA to obtain more reliable ensemble logits. Moreover, we introduce a Multi-level Knowledge Adaptation (MKA) module to efficiently transfer the multi-level feature knowledge from TA and ensemble logits to learn a compact student model. Extensive experiments on two benchmarks show that our GoodSAM achieves a remarkable +3.75\% mIoU improvement over the state-of-the-art (SOTA) domain adaptation methods. Also, our most lightweight model achieves comparable performance to the SOTA methods with only 3.7M parameters.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Deep learning for omnidirectional vision: A survey and new perspectives. arXiv preprint arXiv:2205.10468, 2022.
  2. Semantic segment anything. https://github.com/fudan-zvg/Semantic-Segment-Anything, 2023a.
  3. Segment anything model (sam) enhanced pseudo labels for weakly supervised semantic segmentation. arXiv preprint arXiv:2305.05803, 2023b.
  4. Segment and track anything. arXiv preprint arXiv:2305.06558, 2023.
  5. Knowledge adaptation for efficient semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 578–587, 2019.
  6. Active perception for outdoor localisation with an omnidirectional camera. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4567–4574. IEEE, 2020.
  7. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  8. Sgat4pass: Spherical geometry-aware transformer for panoramic semantic segmentation. arXiv preprint arXiv:2306.03403, 2023.
  9. Pano-sfmlearner: Self-supervised multi-task learning of depth and semantics in panoramic videos. IEEE Signal Processing Letters, 28:832–836, 2021a.
  10. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021b.
  11. Densepass: Dense panoramic semantic segmentation via unsupervised domain adaptation with attention-augmented context exchange. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pages 2766–2772. IEEE, 2021.
  12. Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
  13. Knowledge adaptation with partiallyshared features for event detectionusing few exemplars. IEEE transactions on pattern analysis and machine intelligence, 36(9):1789–1802, 2014.
  14. Lvm-med: Learning large-scale self-supervised vision models for medical imaging via second-order graph matching. arXiv preprint arXiv:2306.11925, 2023.
  15. Semantic segmentation of outdoor panoramic images. Signal, Image and Video Processing, 16(3):643–650, 2022.
  16. Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 19(1):263–272, 2017.
  17. Boundary-enhanced co-training for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19574–19584, 2023.
  18. Sam. md: Zero-shot medical image segmentation capabilities of the segment anything model. arXiv preprint arXiv:2304.05396, 2023.
  19. Knowledge adaptation: Teaching to adapt. arXiv preprint arXiv:1702.02052, 2017.
  20. Flava: A foundational language and vision alignment model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15638–15650, 2022.
  21. Self-supervised learning of depth and camera motion from 360 {{\{{\\\backslash\deg}}\}} videos. arXiv preprint arXiv:1811.05304, 2018.
  22. Bifuse: Monocular 360 depth estimation via bi-projection fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 462–471, 2020.
  23. Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023.
  24. Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077–12090, 2021.
  25. Semantic segmentation of panoramic images using a synthetic dataset. In Artificial Intelligence and Machine Learning in Defense Applications, pages 90–104. SPIE, 2019.
  26. Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023.
  27. Can we pass beyond the field of view? panoramic annular semantic segmentation for real-world surrounding perception. In 2019 IEEE Intelligent Vehicles Symposium (IV), pages 446–453. IEEE, 2019a.
  28. Pass: Panoramic annular semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 21(10):4171–4185, 2019b.
  29. Ds-pass: Detail-sensitive panoramic annular semantic segmentation through swaftnet for surrounding sensing. In 2020 IEEE Intelligent Vehicles Symposium (IV), pages 457–464. IEEE, 2020a.
  30. Omnisupervised omnidirectional semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 23(2):1184–1199, 2020b.
  31. Is context-aware cnn ready for the surroundings? panoramic semantic segmentation in the wild. IEEE Transactions on Image Processing, 30:1866–1881, 2021.
  32. Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790, 2023.
  33. Prototypical cross-domain self-supervised learning for few-shot unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13834–13844, 2021.
  34. Deeppanocontext: Panoramic 3d scene understanding with holistic scene context graph and relation-based optimization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12632–12641, 2021a.
  35. A comprehensive survey on segment anything model for vision and beyond, 2023.
  36. Transfer beyond the field of view: Dense panoramic semantic segmentation via unsupervised domain adaptation. IEEE Transactions on Intelligent Transportation Systems, 23(7):9478–9491, 2021b.
  37. Bending reality: Distortion-aware transformers for adapting to panoramic semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16917–16927, 2022a.
  38. Behind every domain there is a shift: Adapting distortion-aware vision transformers for panoramic semantic segmentation. arXiv preprint arXiv:2207.11860, 2022b.
  39. How segment anything model (sam) boost medical image segmentation? arXiv preprint arXiv:2305.03678, 2023.
  40. Curriculum domain adaptation for semantic segmentation of urban scenes. In Proceedings of the IEEE international conference on computer vision, pages 2020–2030, 2017.
  41. Look at the neighbor: Distortion-aware unsupervised domain adaptation for panoramic semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18687–18698, 2023a.
  42. Both style and distortion matter: Dual-path unsupervised domain adaptation for panoramic semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1285–1295, 2023b.
  43. Semantics, distortion, and style matter: Towards source-free uda for panoramic segmentation, 2024.
  44. Patch-mix transformer for unsupervised domain adaptation: A game perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3561–3571, 2023a.
  45. A good student is cooperative and reliable: Cnn-transformer collaborative learning for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11720–11730, 2023b.
Citations (6)

Summary

We haven't generated a summary for this paper yet.