Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation (2402.18467v3)
Abstract: Weakly supervised semantic segmentation (WSSS) with image-level labels aims to achieve segmentation tasks without dense annotations. However, attributed to the frequent coupling of co-occurring objects and the limited supervision from image-level labels, the challenging co-occurrence problem is widely present and leads to false activation of objects in WSSS. In this work, we devise a 'Separate and Conquer' scheme SeCo to tackle this issue from dimensions of image space and feature space. In the image space, we propose to 'separate' the co-occurring objects with image decomposition by subdividing images into patches. Importantly, we assign each patch a category tag from Class Activation Maps (CAMs), which spatially helps remove the co-context bias and guide the subsequent representation. In the feature space, we propose to 'conquer' the false activation by enhancing semantic representation with multi-granularity knowledge contrast. To this end, a dual-teacher-single-student architecture is designed and tag-guided contrast is conducted, which guarantee the correctness of knowledge and further facilitate the discrepancy among co-contexts. We streamline the multi-staged WSSS pipeline end-to-end and tackle this issue without external supervision. Extensive experiments are conducted, validating the efficiency of our method and the superiority over previous single-staged and even multi-staged competitors on PASCAL VOC and MS COCO. Code is available at https://github.com/zwyang6/SeCo.git.
- Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4981–4990, 2018.
- Single-stage semantic segmentation from image labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4253–4262, 2020.
- What’s the point: Semantic segmentation with point supervision. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, pages 549–565. Springer, 2016.
- Emerging properties in self-supervised vision transformers. In ICCV, pages 9650–9660, 2021.
- Fpr: False positive rectification for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1108–1118, 2023.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
- Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation. In CVPR, pages 4288–4298, June 2022.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Out-of-candidate rectification for weakly supervised semantic segmentation. In CVPR, pages 23673–23684, 2023.
- Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In ICCV, pages 1635–1643, 2015.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Weakly supervised semantic segmentation by pixel-to-prototype contrast. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4320–4329, June 2022.
- The pascal visual object classes challenge: A retrospective. IJCV, 111:98–136, 2015.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- L2g: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In CVPR, pages 16886–16896, 2022.
- Efficient inference in fully connected crfs with gaussian edge potentials. NeurIPS, 24, 2011.
- Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4071–4080, 2021.
- Weakly supervised semantic segmentation using out-of-distribution data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16897–16906, 2022.
- Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2643–2652, 2021.
- Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5495–5505, 2021.
- Weakly supervised semantic segmentation via progressive patch learning. IEEE Transactions on Multimedia, 2022.
- Group-wise semantic mining for weakly supervised semantic segmentation. In AAAI, volume 35, pages 1984–1992, 2021.
- Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3159–3167, 2016.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. arXiv preprint arXiv:2212.09506, 2022.
- Cross-image region mining with region prototypical network for weakly supervised segmentation. IEEE Transactions on Multimedia, 2021.
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Learning self-supervised low-rank network for single-stage weakly and semi-supervised semantic segmentation. IJCV, 130(5):1181–1195, 2022.
- From image-level to pixel-level labeling with convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1713–1721, 2015.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Imagenet-21k pretraining for the masses. arXiv preprint arXiv:2104.10972, 2021.
- Max pooling with vision transformers reconciles class and shape in weakly supervised semantic segmentation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXX, pages 446–463. Springer, 2022.
- Learning affinity from attention: end-to-end weakly-supervised semantic segmentation with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16846–16855, 2022.
- Token contrast for weakly-supervised semantic segmentation. arXiv preprint arXiv:2303.01267, 2023.
- Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7262–7272, 2021.
- Context decoupling augmentation for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7004–7014, 2021.
- On regularized losses for weakly-supervised cnn segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 507–522, 2018.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Learning random-walk label propagation for weakly-supervised semantic segmentation. In CVPR, pages 7158–7166, 2017.
- Exploring cross-image pixel contrast for semantic segmentation. In ICCV, pages 7303–7313, 2021.
- Weakly-supervised semantic segmentation by iteratively mining common object features. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1354–1362, 2018.
- Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90:119–133, 2019.
- Clims: cross language image matching for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4483–4492, 2022.
- Multi-class token transformer for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4310–4319, 2022.
- Self correspondence distillation for end-to-end weakly-supervised semantic segmentation. arXiv preprint arXiv:2302.13765, 2023.
- Learning deep features for discriminative localization. In CVPR, pages 2921–2929, 2016.
- Rethinking semantic segmentation: A prototype view. In CVPR, pages 2582–2593, 2022.
- Regional semantic contrast and aggregation for weakly supervised semantic segmentation. In CVPR, pages 4299–4309, 2022.