Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Background Noise Reduction of Attention Map for Weakly Supervised Semantic Segmentation (2404.03394v2)

Published 4 Apr 2024 in cs.CV

Abstract: In weakly-supervised semantic segmentation (WSSS) using only image-level class labels, a problem with CNN-based Class Activation Maps (CAM) is that they tend to activate the most discriminative local regions of objects. On the other hand, methods based on Transformers learn global features but suffer from the issue of background noise contamination. This paper focuses on addressing the issue of background noise in attention weights within the existing WSSS method based on Conformer, known as TransCAM. The proposed method successfully reduces background noise, leading to improved accuracy of pseudo labels. Experimental results demonstrate that our model achieves segmentation performance of 70.5% on the PASCAL VOC 2012 validation data, 71.1% on the test data, and 45.9% on MS COCO 2014 data, outperforming TransCAM in terms of segmentation performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Weakly supervised learning of instance segmentation with inter-pixel relations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2209–2218, 2019.
  2. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4981–4990, 2018.
  3. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481–2495, 2017.
  4. What’s the point: Semantic segmentation with point supervision. In European conference on computer vision, pages 549–565. Springer, 2016.
  5. Mixup-cam: Weakly-supervised semantic segmentation via uncertainty regularization. arXiv preprint arXiv:2008.01201, 2020.
  6. Weakly-supervised semantic segmentation via sub-category exploration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8991–9000, 2020.
  7. Weakly supervised semantic segmentation with boundary exploration. In European Conference on Computer Vision, pages 347–362. Springer, 2020.
  8. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062, 2014.
  9. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
  10. Mobile-former: Bridging mobilenet and transformer. arXiv preprint arXiv:2108.05895, 2021.
  11. Lctr: On awakening the local continuity of transformer for weakly supervised object localization. arXiv preprint arXiv:2112.05291, 2021.
  12. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1635–1643, 2015.
  13. Coatnet: Marrying convolution and attention for all data sizes. Advances in Neural Information Processing Systems, 34, 2021.
  14. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  15. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010.
  16. Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4283–4292, 2020.
  17. Cian: Cross-image affinity net for weakly supervised semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 10762–10769, 2020.
  18. Ts-cam: Token semantic coupled attention map for weakly supervised object localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2886–2895, 2021.
  19. Semantic contours from inverse detectors. In 2011 international conference on computer vision, pages 991–998. IEEE, 2011.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  21. Self-erasing network for integral object attention. Advances in Neural Information Processing Systems, 31, 2018.
  22. Self-transfer learning for weakly supervised lesion localization. In International conference on medical image computing and computer-assisted intervention, pages 239–246. Springer, 2016.
  23. Puzzle-cam: Improved localization via matching partial and full features. In 2021 IEEE International Conference on Image Processing (ICIP), pages 639–643. IEEE, 2021.
  24. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 876–885, 2017.
  25. Seed, expand and constrain: Three principles for weakly-supervised image segmentation. In European conference on computer vision, pages 695–711. Springer, 2016.
  26. Efficient inference in fully connected crfs with gaussian edge potentials. Advances in neural information processing systems, 24, 2011.
  27. Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5267–5276, 2019.
  28. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4071–4080, 2021.
  29. Tell me where to look: Guided attention inference network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9215–9223, 2018.
  30. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3159–3167, 2016.
  31. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
  32. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  33. Conformer: Local features coupling global representations for visual recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 367–376, 2021.
  34. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
  35. Self-supervised difference detection for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5208–5217, 2019.
  36. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  37. Mining cross-image semantics for weakly supervised semantic segmentation. In European conference on computer vision, pages 347–365. Springer, 2020.
  38. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pages 10347–10357. PMLR, 2021.
  39. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  40. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7794–7803, 2018.
  41. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12275–12284, 2020.
  42. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1568–1576, 2017.
  43. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22–31, 2021.
  44. Embedded discriminative attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16765–16774, 2021.
  45. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition, 90:119–133, 2019.
  46. Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6984–6993, 2021.
  47. Vitae: Vision transformer advanced by exploring intrinsic inductive bias. Advances in Neural Information Processing Systems, 34, 2021.
  48. Non-salient region object mining for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2623–2632, 2021.
  49. Causal intervention for weakly-supervised semantic segmentation. Advances in Neural Information Processing Systems, 33:655–666, 2020.
  50. Complementary patch for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7242–7251, 2021.
  51. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2921–2929, 2016.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com