Top-K Pooling with Patch Contrastive Learning for Weakly-Supervised Semantic Segmentation (2310.09828v2)
Abstract: Weakly Supervised Semantic Segmentation (WSSS) using only image-level labels has gained significant attention due to cost-effectiveness. Recently, Vision Transformer (ViT) based methods without class activation map (CAM) have shown greater capability in generating reliable pseudo labels than previous methods using CAM. However, the current ViT-based methods utilize max pooling to select the patch with the highest prediction score to map the patch-level classification to the image-level one, which may affect the quality of pseudo labels due to the inaccurate classification of the patches. In this paper, we introduce a novel ViT-based WSSS method named top-K pooling with patch contrastive learning (TKP-PCL), which employs a top-K pooling layer to alleviate the limitations of previous max pooling selection. A patch contrastive error (PCE) is also proposed to enhance the patch embeddings to further improve the final results. The experimental results show that our approach is very efficient and outperforms other state-of-the-art WSSS methods on the PASCAL VOC 2012 dataset.
- “Learning deep features for discriminative localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 2921–2929.
- “Class re-activation maps for weakly-supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 969–978.
- “Single-stage semantic segmentation from image labels,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 4253–4262.
- “Weakly supervised learning of instance segmentation with inter-pixel relations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 2209–2218.
- “Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 4071–4080.
- “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- “Max pooling with vision transformers reconciles class and shape in weakly supervised semantic segmentation,” in Eur. Conf. Comput. Vis., 2022, pp. 446–463.
- “Token contrast for weakly-supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2023, pp. 3093–3102.
- “Multi-class token transformer for weakly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 4310–4319.
- “Learning affinity from attention: End-to-end weakly-supervised semantic segmentation with transformers,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 16846–16855.
- “The pascal visual object classes (VOC) challenge,” Int. J. Comput. Vis., vol. 88, pp. 303–338, 2010.
- “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
- “Efficient inference in fully connected crfs with gaussian edge potentials,” in Int. Conf. Neur. Info. Process. Sys., 2011, vol. 24.
- “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2017.
- “Object region mining with adversarial erasing: A simple classification to semantic segmentation approach,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1568–1576.
- “Integral object mining via online attention accumulation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2070–2079.
- “Mining cross-image semantics for weakly supervised semantic segmentation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16. Springer, 2020, pp. 347–365.
- “Weakly-supervised semantic segmentation via sub-category exploration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8991–9000.
- “Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 4288–4298.
- “Weakly supervised semantic segmentation by pixel-to-prototype contrast,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 4320–4329.
- “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213–229.
- “Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 4981–4990.
- “Pixel recurrent neural networks,” in Proceedings of The 33rd International Conference on Machine Learning, 2016, pp. 1747–1756.
- “Semantic contours from inverse detectors,” in Proc. IEEE Int. Conf. Comput. Vis., 2011, pp. 991–998.
- “Seed, expand and constrain: Three principles for weakly-supervised image segmentation,” in Eur. Conf. Comput. Vis., 2016, pp. 695–711.
- “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Eur. Conf. Comput. Vis., 2018, pp. 801–818.
- “Self-attention prediction correction with channel suppression for weakly-supervised semantic segmentation,” in 2023 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2023, pp. 846–851.