Papers
Topics
Authors
Recent
Search
2000 character limit reached

Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation

Published 27 Feb 2024 in cs.CV | (2402.17891v2)

Abstract: Class activation maps (CAMs) are commonly employed in weakly supervised semantic segmentation (WSSS) to produce pseudo-labels. Due to incomplete or excessive class activation, existing studies often resort to offline CAM refinement, introducing additional stages or proposing offline modules. This can cause optimization difficulties for single-stage methods and limit generalizability. In this study, we aim to reduce the observed CAM inconsistency and error to mitigate reliance on refinement processes. We propose an end-to-end WSSS model incorporating guided CAMs, wherein our segmentation model is trained while concurrently optimizing CAMs online. Our method, Co-training with Swapping Assignments (CoSA), leverages a dual-stream framework, where one sub-network learns from the swapped assignments generated by the other. We introduce three techniques: i) soft perplexity-based regularization to penalize uncertain regions; ii) a threshold-searching approach to dynamically revise the confidence threshold; and iii) contrastive separation to address the coexistence problem. CoSA demonstrates exceptional performance, achieving mIoU of 76.2\% and 51.0\% on VOC and COCO validation datasets, respectively, surpassing existing baselines by a substantial margin. Notably, CoSA is the first single-stage approach to outperform all existing multi-stage methods including those with additional supervision. Code is avilable at \url{https://github.com/youshyee/CoSA}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Weakly supervised learning of instance segmentation with inter-pixel relations. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 2209–2218, 2019.
  2. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 4981–4990, 2018.
  3. Single-stage semantic segmentation from image labels. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 4253–4262, 2020.
  4. What’s the point: Semantic segmentation with point supervision. In European Conference on Computer Vision (ECCV), pages 549–565. Springer, 2016.
  5. Unsupervised learning of visual features by contrasting cluster assignments. Neural Information Processing Systems (NeurIPS), 33:9912–9924, 2020.
  6. Emerging properties in self-supervised vision transformers. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 9650–9660, 2021.
  7. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In IEEE Winter Conference on Applications of Computer Vision (WACV), pages 839–847. IEEE, 2018.
  8. Fpr: False positive rectification for weakly supervised semantic segmentation. In IEEE International Conference on Computer Vision (ICCV), pages 1108–1118, 2023.
  9. Weakly supervised semantic segmentation with boundary exploration. In European Conference on Computer Vision (ECCV), pages 347–362. Springer, 2020.
  10. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 40(4):834–848, 2017.
  11. Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 4288–4298, 2022.
  12. Extracting class activation maps from non-discriminative features as well. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 3135–3144, 2023.
  13. C-cam: Causal cam for weakly supervised semantic segmentation on medical image. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 11676–11685, 2022.
  14. Class re-activation maps for weakly-supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 969–978, 2022.
  15. Out-of-candidate rectification for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 23673–23684, 2023.
  16. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In IEEE International Conference on Computer Vision (ICCV), pages 1635–1643, 2015.
  17. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021.
  18. Weakly supervised semantic segmentation by pixel-to-prototype contrast. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 4320–4329, 2022.
  19. The pascal visual object classes (voc) challenge. International journal of computer vision (IJCV), 88:303–338, 2010.
  20. Cian: Cross-image affinity net for weakly supervised semantic segmentation. In AAAI Conference on Artificial Intelligence (AAAI), volume 34, pages 10762–10769, 2020.
  21. Ts-cam: Token semantic coupled attention map for weakly supervised object localization. In IEEE International Conference on Computer Vision (ICCV), pages 2886–2895, 2021.
  22. Bootstrap your own latent-a new approach to self-supervised learning. Neural Information Processing Systems (NeurIPS), 33:21271–21284, 2020.
  23. Semantic contours from inverse detectors. In IEEE International Conference on Computer Vision (ICCV), pages 991–998. IEEE, 2011.
  24. Hypercolumns for object segmentation and fine-grained localization. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 447–456, 2015.
  25. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  26. L2g: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 16886–16896, 2022.
  27. Universal weakly supervised segmentation by pixel-to-segment contrastive learning. In International Conference on Learning Representations (ICLR), 2020.
  28. Simple does it: Weakly supervised instance and semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 876–885, 2017.
  29. Efficient inference in fully connected crfs with gaussian edge potentials. Neural Information Processing Systems (NeurIPS), 24, 2011.
  30. Unlocking the potential of ordinary classifier: Class-specific adversarial erasing framework for weakly supervised semantic segmentation. In IEEE International Conference on Computer Vision (ICCV), pages 6994–7003, 2021.
  31. Weakly supervised semantic segmentation via adversarial learning of classifier and reconstructor. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 11329–11339, 2023.
  32. Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 5267–5276, 2019.
  33. Anti-adversarially manipulated attributions for weakly supervised semantic segmentation and object localization. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022.
  34. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 4071–4080, 2021.
  35. Weakly supervised semantic segmentation using out-of-distribution data. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 16897–16906, 2022.
  36. Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 2643–2652, 2021.
  37. Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 5495–5505, 2021.
  38. Towards noiseless object contours for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 16856–16865, 2022.
  39. Expansion and shrinkage of localization for weakly-supervised semantic segmentation. In Neural Information Processing Systems (NeurIPS), 2022.
  40. Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation. In AAAI Conference on Artificial Intelligence (AAAI), volume 36, pages 1447–1455, 2022.
  41. Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV), pages 740–755. Springer, 2014.
  42. Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 15305–15314, 2023.
  43. Adaptive early-learning correction for segmentation from noisy annotations. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 2606–2616, 2022.
  44. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE International Conference on Computer Vision (ICCV), pages 10012–10022, 2021.
  45. One thing one click: A self-training approach for weakly supervised 3d semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 1726–1736, 2021.
  46. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  47. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  48. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  49. Acceleration of stochastic approximation by averaging. SIAM journal on control and optimization, 30(4):838–855, 1992.
  50. Boundary-enhanced co-training for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 19574–19584, 2023.
  51. Max pooling with vision transformers reconciles class and shape in weakly supervised semantic segmentation. In European Conference on Computer Vision (ECCV), pages 446–463. Springer, 2022.
  52. Learning affinity from attention: end-to-end weakly-supervised semantic segmentation with transformers. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 16846–16855, 2022.
  53. Token contrast for weakly-supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 3093–3102, 2023.
  54. David Ruppert. Efficient estimations from a slowly convergent robbins-monro process. Technical report, Cornell University Operations Research and Industrial Engineering, 1988.
  55. Grad-cam: Visual explanations from deep networks via gradient-based localization. In IEEE International Conference on Computer Vision (ICCV), pages 618–626, 2017.
  56. Self-supervised difference detection for weakly-supervised semantic segmentation. In IEEE International Conference on Computer Vision (ICCV), pages 5208–5217, 2019.
  57. Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 3136–3145, 2019.
  58. Ecs-net: Improving weakly supervised semantic segmentation by using connections between class activation maps. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 7283–7292, 2021.
  59. Treating pseudo-labels generation as image matting for weakly supervised semantic segmentation. In IEEE International Conference on Computer Vision (ICCV), pages 755–765, 2023.
  60. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 12275–12284, 2020.
  61. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90:119–133, 2019.
  62. Unified perceptual parsing for scene understanding. In European Conference on Computer Vision (ECCV), pages 418–434, 2018.
  63. Clims: Cross language image matching for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 4483–4492, 2022.
  64. C2am: Contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 989–998, 2022.
  65. Multi-class token transformer for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 4310–4319, 2022.
  66. Learning multi-modal class-specific tokens for weakly supervised dense object localization. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 19596–19605, 2023.
  67. Self correspondence distillation for end-to-end weakly-supervised semantic segmentation. In AAAI Conference on Artificial Intelligence (AAAI). AAAI Press, 2023.
  68. Adversarial erasing framework via triplet with gated pyramid pooling layer for weakly supervised semantic segmentation. In European Conference on Computer Vision (ECCV), pages 326–344. Springer, 2022.
  69. ex-vit: A novel explainable vision transformer for weakly supervised semantic segmentation. Pattern Recognition, page 109666, 2023.
  70. Reliability does matter: An end-to-end weakly supervised semantic segmentation approach. In AAAI Conference on Artificial Intelligence (AAAI), volume 34, pages 12765–12772, 2020.
  71. Adversarial complementary learning for weakly supervised object localization. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 1325–1334, 2018.
  72. Learning deep features for discriminative localization. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 2921–2929, 2016.
  73. Regional semantic contrast and aggregation for weakly supervised semantic segmentation. In IEEE Computer Vision and Pattern Recognition (CVPR), pages 4299–4309, 2022.
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.