Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Weakly Supervised Semantic Segmentation for Driving Scenes (2312.13646v3)

Published 21 Dec 2023 in cs.CV

Abstract: State-of-the-art techniques in weakly-supervised semantic segmentation (WSSS) using image-level labels exhibit severe performance degradation on driving scene datasets such as Cityscapes. To address this challenge, we develop a new WSSS framework tailored to driving scene datasets. Based on extensive analysis of dataset characteristics, we employ Contrastive Language-Image Pre-training (CLIP) as our baseline to obtain pseudo-masks. However, CLIP introduces two key challenges: (1) pseudo-masks from CLIP lack in representing small object classes, and (2) these masks contain notable noise. We propose solutions for each issue as follows. (1) We devise Global-Local View Training that seamlessly incorporates small-scale patches during model training, thereby enhancing the model's capability to handle small-sized yet critical objects in driving scenes (e.g., traffic light). (2) We introduce Consistency-Aware Region Balancing (CARB), a novel technique that discerns reliable and noisy regions through evaluating the consistency between CLIP masks and segmentation predictions. It prioritizes reliable pixels over noisy pixels via adaptive loss weighting. Notably, the proposed method achieves 51.8\% mIoU on the Cityscapes test dataset, showcasing its potential as a strong WSSS baseline on driving scene datasets. Experimental results on CamVid and WildDash2 demonstrate the effectiveness of our method across diverse datasets, even with small-scale datasets or visually challenging conditions. The code is available at https://github.com/k0u-id/CARB.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  2. Single Stage Weakly Supervised Semantic Segmentation of Complex Scenes. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.
  3. Single-Stage Semantic Segmentation from Image Labels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  4. What’s the point: Semantic segmentation with point supervision. In Proceedings of European Conference on Computer Vision (ECCV). Springer.
  5. Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters.
  6. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  7. Attention-based dropout layer for weakly supervised object localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  8. Contributors, M. 2020. MMSegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark. https://github.com/open-mmlab/mmsegmentation.
  9. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition.
  10. Decoupling zero-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  11. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations.
  12. The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision.
  13. Dmt: Dynamic mutual training for semi-supervised learning. Pattern Recognition.
  14. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.
  15. Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; and Lin, H., eds., Advances in Neural Information Processing Systems. Curran Associates, Inc.
  16. Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling. IEEE Conference on Computer Vision and Pattern Recognition.
  17. Integral object mining via online attention accumulation. In Proceedings of the IEEE International Conference on Computer Vision.
  18. L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
  19. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? In Guyon, I.; von Luxburg, U.; Bengio, S.; Wallach, H. M.; Fergus, R.; Vishwanathan, S. V. N.; and Garnett, R., eds., Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA.
  20. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In Proceedings of the European Conference on Computer Vision. Springer.
  21. Weakly supervised semantic segmentation using out-of-distribution data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  22. Threshold matters in WSSS: manipulating the activation for the robust and accurate segmentation model against thresholds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  23. Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
  24. Language-driven Semantic Segmentation. In International Conference on Learning Representations.
  25. Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence.
  26. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer.
  27. CLIP Is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15305–15314.
  28. Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  29. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR.
  30. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, 618–626.
  31. Deep clustering for weakly-supervised semantic segmentation in autonomous driving scenes. Neurocomputing, 381.
  32. Semi-Supervised Semantic Segmentation Using Unreliable Pseudo Labels. In Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR).
  33. Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  34. CRIS: CLIP-Driven Referring Image Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  35. Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  36. CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  37. A Simple Baseline for Open Vocabulary Semantic Segmentation with Pre-trained Vision-language Model. Proceedings of the IEEE/CVF European Conference on Computer Vision (ECCV).
  38. St++: Make self-training work better for semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  39. Adversarial Erasing Framework via Triplet with Gated Pyramid Pooling Layer for Weakly Supervised Semantic Segmentation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIX. Springer.
  40. Unifying Panoptic Segmentation for Autonomous Driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 21351–21360.
  41. Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press.
  42. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
  43. Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision.
  44. Extract free dense labels from clip. In Proceedings of the European Conference on Computer Vision. Springer.
Citations (2)

Summary

We haven't generated a summary for this paper yet.