Contextrast: Contextual Contrastive Learning for Semantic Segmentation (2404.10633v2)
Abstract: Despite great improvements in semantic segmentation, challenges persist because of the lack of local/global contexts and the relationship between them. In this paper, we propose Contextrast, a contrastive learning-based semantic segmentation method that allows to capture local/global contexts and comprehend their relationships. Our proposed method comprises two parts: a) contextual contrastive learning (CCL) and b) boundary-aware negative (BANE) sampling. Contextual contrastive learning obtains local/global context from multi-scale feature aggregation and inter/intra-relationship of features for better discrimination capabilities. Meanwhile, BANE sampling selects embedding features along the boundaries of incorrectly predicted regions to employ them as harder negative samples on our contrastive learning, resolving segmentation issues along the boundary region by exploiting fine-grained details. We demonstrate that our Contextrast substantially enhances the performance of semantic segmentation networks, outperforming state-of-the-art contrastive learning approaches on diverse public datasets, e.g. Cityscapes, CamVid, PASCAL-C, COCO-Stuff, and ADE20K, without an increase in computational cost during inference.
- Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters, 30(2):88–97, 2009.
- COCO-stuff: Thing and stuff classes in context. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1209–1218, 2018.
- Are all negatives created equal in contrastive instance discrimination? arXiv preprint arXiv:2010.06682, 2020.
- Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv preprint arXiv:1412.7062, 2014.
- DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, 2017a.
- Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017b.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision, pages 801–818, 2018.
- Cars can’t fly up in the sky: Improving urban-scene segmentation via height-driven attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognitionn, pages 9373–9383, 2020.
- The Cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3213–3223, 2016.
- ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
- Semantic correlation promoted shape-variant context for segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8885–8894, 2019.
- ColonFormer: An efficient transformer-based method for colon polyp segmentation. IEEE Access, 10:80575–80586, 2022.
- Using DUCK-Net for polyp image segmentation. Scientific Reports, 13(1):9803, 2023.
- Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3146–3154, 2019.
- Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pages 297–304. JMLR, 2010.
- Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes. arXiv preprint arXiv:2101.06085, 2021.
- Region-aware contrastive learning for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16291–16301, 2021.
- Semantic scene segmentation for robotics. In Deep Learning for Robot Perception and Cognition, pages 279–311. Elsevier, 2022.
- Progressive semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16755–16764, 2021.
- Hard negative mixing for contrastive learning. Advances in Neural Information Processing Systems, 33:21798–21809, 2020.
- Adaptive affinity fields for semantic segmentation. In Proceedings of the European Conference on Computer Vision, pages 587–602, 2018.
- Supervised contrastive learning. Advances in Neural Information Processing Systems, 33:18661–18673, 2020.
- Sub-pixel distance maps and weighted distance transforms. Journal of Mathematical Imaging and Vision, 6:223–233, 1996.
- Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180, 2018.
- Deep hierarchical semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1246–1257, 2022a.
- Targeted supervised contrastive learning for long-tailed recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6918–6928, 2022b.
- Swin Transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021.
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.
- The role of context for object detection and semantic segmentation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 891–898, 2014.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Multi-scale and cross-scale contrastive learning for semantic segmentation. In Proceedings of the European Conference on Computer Vision, pages 413–429. Springer, 2022.
- FCN-transformer feature fusion for polyp segmentation. In Proceedings of the Annual Conference on Medical Image Understanding and Analysis, pages 892–907. Springer, 2022.
- FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 815–823, 2015.
- Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7262–7272, 2021.
- High-resolution representations for labeling pixels and regions. arXiv preprint arXiv:1904.04514, 2019.
- Semantic diffusion network for semantic segmentation. Advances in Neural Information Processing Systems, 35:8702–8716, 2022.
- Semantic scene segmentation for robotics applications. In Proceedings of the International Conference on Information, Intelligence, Systems & Applications, pages 1–4, 2021.
- Active boundary loss for semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2397–2405, 2022.
- Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3349–3364, 2020.
- Stepwise feature fusion: Local guides global. arXiv preprint arXiv:2203.03635, 2023a.
- Exploring cross-image pixel contrast for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7303–7313, 2021.
- Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14408–14419, 2023b.
- ConvNext v2: Co-designing and scaling ConvNets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16133–16142, 2023.
- Unified perceptual parsing for scene understanding. In Proceedings of the European Conference on Computer Vision, pages 418–434, 2018.
- Delving into inter-image invariance for unsupervised visual representations. International Journal of Computer Vision, 130(12):2994–3013, 2022.
- PIDNet: A real-time semantic segmentation network inspired by PID controllers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19529–19539, 2023.
- Learning a discriminative feature network for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1857–1866, 2018.
- Context prior for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12416–12425, 2020.
- Object-contextual representations for semantic segmentation. In Proceedings of the European Conference on Computer Vision, pages 173–190. Springer, 2020a.
- SegFix: Model-agnostic boundary refinement for segmentation. In Proceedings of the European Conference on Computer Vision, pages 489–506. Springer, 2020b.
- Semantic segmentation with extended deeplabv3 architecture. In Proceedings of the Signal Processing and Communications Applications Conference, pages 1–4. IEEE, 2019.
- Dual graph convolutional network for semantic segmentation. arXiv preprint arXiv:1909.06121, 2019.
- Pyramid scene parsing network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2881–2890, 2017.
- Understanding imbalanced semantic segmentation through neural collapse. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19550–19560, 2023.
- Scene parsing through ADE20K dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 633–641, 2017.
- Fusion PSPNet image segmentation based method for multi-focus image fusion. IEEE Photonics Journal, 11(6):1–12, 2019.