AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation (2403.01818v3)
Abstract: Semi-supervised semantic segmentation (SSSS) has been proposed to alleviate the burden of time-consuming pixel-level manual labeling, which leverages limited labeled data along with larger amounts of unlabeled data. Current state-of-the-art methods train the labeled data with ground truths and unlabeled data with pseudo labels. However, the two training flows are separate, which allows labeled data to dominate the training process, resulting in low-quality pseudo labels and, consequently, sub-optimal results. To alleviate this issue, we present AllSpark, which reborns the labeled features from unlabeled ones with the channel-wise cross-attention mechanism. We further introduce a Semantic Memory along with a Channel Semantic Grouping strategy to ensure that unlabeled features adequately represent labeled features. The AllSpark shed new light on the architecture level designs of SSSS rather than framework level, which avoids increasingly complicated training pipeline designs. It can also be regarded as a flexible bottleneck module that can be seamlessly integrated into a general transformer-based segmentation model. The proposed AllSpark outperforms existing methods across all evaluation protocols on Pascal, Cityscapes and COCO benchmarks without bells-and-whistles. Code and model weights are available at: https://github.com/xmed-lab/AllSpark.
- Mixmatch: A holistic approach to semi-supervised learning. NeurIPS, 32, 2019.
- Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition. In ICCV, 2023.
- Crossvit: Cross-attention multi-scale vision transformer for image classification. In ICCV, pages 357–366, 2021a.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE TPAMI, 40(4):834–848, 2017.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, pages 801–818, 2018.
- Semi-supervised semantic segmentation with cross pseudo supervision. In CVPR, pages 2613–2622, 2021b.
- Per-pixel classification is not all you need for semantic segmentation. NeurIPS, 34:17864–17875, 2021.
- The cityscapes dataset for semantic urban scene understanding. In CVPR, pages 3213–3223, 2016.
- Davit: Dual attention vision transformers. In ECCV, pages 74–92. Springer, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
- The pascal visual object classes challenge: A retrospective. IJCV, 111:98–136, 2015.
- Semi-supervised semantic segmentation needs strong, high-dimensional perturbations. BMVC, 2020.
- Transformer in transformer. NeurIPS, 34:15908–15919, 2021.
- Semantic contours from inverse detectors. In 2011 international conference on computer vision, pages 991–998. IEEE, 2011.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- Semi-supervised semantic segmentation via adaptive equalization learning. NeurIPS, 34:22106–22118, 2021.
- Semicvt: Semi-supervised convolutional vision transformer for semantic segmentation. In CVPR, pages 11340–11349, 2023.
- Oneformer: One transformer to rule universal image segmentation. In CVPR, pages 2989–2998, 2023a.
- Semask: Semantically masked transformers for semantic segmentation. In ICCV, pages 752–761, 2023b.
- Semi-supervised semantic segmentation via gentle teaching assistant. NeurIPS, 35:2803–2816, 2022.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Dong-Hyun Lee et al. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In ICML, Workshops, page 896, 2013.
- Semi-supervised semantic segmentation under label noise via diverse learning groups. In ICCV, pages 1229–1238, 2023a.
- Cfcg: Semi-supervised semantic segmentation via cross-fusion and contour guidance supervision. In ICCV, pages 16348–16358, 2023b.
- Diverse cotraining makes strong semi-supervised segmentor. In ICCV, 2023c.
- Logic-induced diagnostic reasoning for semi-supervised semantic segmentation. In ICCV, pages 16197–16208, 2023.
- Microsoft coco: Common objects in context. In ECCV, pages 740–755. Springer, 2014.
- Perturbed and strict mean teachers for semi-supervised semantic segmentation. In CVPR, pages 4258–4267, 2022.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, pages 10012–10022, 2021.
- Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015.
- Enhanced soft label for semi-supervised semantic segmentation. In ICCV, pages 1185–1195, 2023.
- Fuzzy positive learning for semi-supervised semantic segmentation. In CVPR, pages 15465–15474, 2023.
- U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pages 234–241. Springer, 2015.
- Fixmatch: Simplifying semi-supervised learning with consistency and confidence. NeurIPS, 33:596–608, 2020.
- Segmenter: Transformer for semantic segmentation. In ICCV, pages 7262–7272, 2021.
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS, 30, 2017.
- Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
- Sequence length is a domain: Length-based overfitting in transformer models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8246–8257, 2021.
- Attention is all you need. NeurIPS, 30, 2017.
- Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer. In AAAI, pages 2441–2449, 2022a.
- Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In ICCV, pages 568–578, 2021.
- Hunting sparsity: Density-guided contrastive learning for semi-supervised semantic segmentation. In CVPR, pages 3114–3123, 2023a.
- Semi-supervised semantic segmentation using unreliable pseudo-labels. In CVPR, pages 4248–4257, 2022b.
- Conflict-based cross-view consistency for semi-supervised semantic segmentation. In CVPR, pages 19585–19595, 2023b.
- Cvt: Introducing convolutions to vision transformers. In ICCV, pages 22–31, 2021.
- Segformer: Simple and efficient design for semantic segmentation with transformers. NeurIPS, 34:12077–12090, 2021.
- St++: Make self-training work better for semi-supervised semantic segmentation. In CVPR, pages 4268–4277, 2022.
- Revisiting weak-to-strong consistency in semi-supervised semantic segmentation. In CVPR, pages 7236–7246, 2023.
- Metaformer is actually what you need for vision. In CVPR, pages 10819–10829, 2022.
- Semi-supervised semantic segmentation with mutual knowledge distillation. In ACM MM, pages 5436–5444, 2023.
- Tokens-to-token vit: Training vision transformers from scratch on imagenet. In ICCV, pages 558–567, 2021.
- Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling. NeurIPS, 34:18408–18419, 2021.
- Pyramid scene parsing network. In CVPR, pages 2881–2890, 2017.
- Augmentation matters: A simple-yet-effective approach to semi-supervised semantic segmentation. In CVPR, pages 11350–11359, 2023.
- Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In CVPR, pages 6881–6890, 2021.
- Pixel contrastive-consistent semi-supervised semantic segmentation. In ICCV, pages 7273–7282, 2021.
- Pseudoseg: Designing pseudo labels for semantic segmentation. ICLR, 2020.