Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation (2402.18467v3)

Published 28 Feb 2024 in cs.CV

Abstract: Weakly supervised semantic segmentation (WSSS) with image-level labels aims to achieve segmentation tasks without dense annotations. However, attributed to the frequent coupling of co-occurring objects and the limited supervision from image-level labels, the challenging co-occurrence problem is widely present and leads to false activation of objects in WSSS. In this work, we devise a 'Separate and Conquer' scheme SeCo to tackle this issue from dimensions of image space and feature space. In the image space, we propose to 'separate' the co-occurring objects with image decomposition by subdividing images into patches. Importantly, we assign each patch a category tag from Class Activation Maps (CAMs), which spatially helps remove the co-context bias and guide the subsequent representation. In the feature space, we propose to 'conquer' the false activation by enhancing semantic representation with multi-granularity knowledge contrast. To this end, a dual-teacher-single-student architecture is designed and tag-guided contrast is conducted, which guarantee the correctness of knowledge and further facilitate the discrepancy among co-contexts. We streamline the multi-staged WSSS pipeline end-to-end and tackle this issue without external supervision. Extensive experiments are conducted, validating the efficiency of our method and the superiority over previous single-staged and even multi-staged competitors on PASCAL VOC and MS COCO. Code is available at https://github.com/zwyang6/SeCo.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4981–4990, 2018.
  2. Single-stage semantic segmentation from image labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4253–4262, 2020.
  3. What’s the point: Semantic segmentation with point supervision. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, pages 549–565. Springer, 2016.
  4. Emerging properties in self-supervised vision transformers. In ICCV, pages 9650–9660, 2021.
  5. Fpr: False positive rectification for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1108–1118, 2023.
  6. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
  7. Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation. In CVPR, pages 4288–4298, June 2022.
  8. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  9. Out-of-candidate rectification for weakly supervised semantic segmentation. In CVPR, pages 23673–23684, 2023.
  10. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In ICCV, pages 1635–1643, 2015.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  12. Weakly supervised semantic segmentation by pixel-to-prototype contrast. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4320–4329, June 2022.
  13. The pascal visual object classes challenge: A retrospective. IJCV, 111:98–136, 2015.
  14. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  15. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  16. L2g: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In CVPR, pages 16886–16896, 2022.
  17. Efficient inference in fully connected crfs with gaussian edge potentials. NeurIPS, 24, 2011.
  18. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4071–4080, 2021.
  19. Weakly supervised semantic segmentation using out-of-distribution data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16897–16906, 2022.
  20. Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2643–2652, 2021.
  21. Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5495–5505, 2021.
  22. Weakly supervised semantic segmentation via progressive patch learning. IEEE Transactions on Multimedia, 2022.
  23. Group-wise semantic mining for weakly supervised semantic segmentation. In AAAI, volume 35, pages 1984–1992, 2021.
  24. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3159–3167, 2016.
  25. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  26. Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. arXiv preprint arXiv:2212.09506, 2022.
  27. Cross-image region mining with region prototypical network for weakly supervised segmentation. IEEE Transactions on Multimedia, 2021.
  28. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
  29. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  30. Learning self-supervised low-rank network for single-stage weakly and semi-supervised semantic segmentation. IJCV, 130(5):1181–1195, 2022.
  31. From image-level to pixel-level labeling with convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1713–1721, 2015.
  32. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  33. Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972, 2021.
  34. Max pooling with vision transformers reconciles class and shape in weakly supervised semantic segmentation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXX, pages 446–463. Springer, 2022.
  35. Learning affinity from attention: end-to-end weakly-supervised semantic segmentation with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16846–16855, 2022.
  36. Token contrast for weakly-supervised semantic segmentation. arXiv preprint arXiv:2303.01267, 2023.
  37. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7262–7272, 2021.
  38. Context decoupling augmentation for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7004–7014, 2021.
  39. On regularized losses for weakly-supervised cnn segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 507–522, 2018.
  40. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  41. Learning random-walk label propagation for weakly-supervised semantic segmentation. In CVPR, pages 7158–7166, 2017.
  42. Exploring cross-image pixel contrast for semantic segmentation. In ICCV, pages 7303–7313, 2021.
  43. Weakly-supervised semantic segmentation by iteratively mining common object features. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1354–1362, 2018.
  44. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90:119–133, 2019.
  45. Clims: cross language image matching for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4483–4492, 2022.
  46. Multi-class token transformer for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4310–4319, 2022.
  47. Self correspondence distillation for end-to-end weakly-supervised semantic segmentation. arXiv preprint arXiv:2302.13765, 2023.
  48. Learning deep features for discriminative localization. In CVPR, pages 2921–2929, 2016.
  49. Rethinking semantic segmentation: A prototype view. In CVPR, pages 2582–2593, 2022.
  50. Regional semantic contrast and aggregation for weakly supervised semantic segmentation. In CVPR, pages 4299–4309, 2022.
Citations (7)

Summary

We haven't generated a summary for this paper yet.