Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tackling Ambiguity from Perspective of Uncertainty Inference and Affinity Diversification for Weakly Supervised Semantic Segmentation (2404.08195v1)

Published 12 Apr 2024 in cs.CV

Abstract: Weakly supervised semantic segmentation (WSSS) with image-level labels intends to achieve dense tasks without laborious annotations. However, due to the ambiguous contexts and fuzzy regions, the performance of WSSS, especially the stages of generating Class Activation Maps (CAMs) and refining pseudo masks, widely suffers from ambiguity while being barely noticed by previous literature. In this work, we propose UniA, a unified single-staged WSSS framework, to efficiently tackle this issue from the perspective of uncertainty inference and affinity diversification, respectively. When activating class objects, we argue that the false activation stems from the bias to the ambiguous regions during the feature extraction. Therefore, we design a more robust feature representation with a probabilistic Gaussian distribution and introduce the uncertainty estimation to avoid the bias. A distribution loss is particularly proposed to supervise the process, which effectively captures the ambiguity and models the complex dependencies among features. When refining pseudo labels, we observe that the affinity from the prevailing refinement methods intends to be similar among ambiguities. To this end, an affinity diversification module is proposed to promote diversity among semantics. A mutual complementing refinement is proposed to initially rectify the ambiguous affinity with multiple inferred pseudo labels. More importantly, a contrastive affinity loss is further designed to diversify the relations among unrelated semantics, which reliably propagates the diversity into the whole feature representations and helps generate better pseudo masks. Extensive experiments are conducted on PASCAL VOC, MS COCO, and medical ACDC datasets, which validate the efficiency of UniA tackling ambiguity and the superiority over recent single-staged or even most multi-staged competitors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
  2. R. Strudel, R. Garcia, I. Laptev, and C. Schmid, “Segmenter: Transformer for semantic segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 7262–7272.
  3. P. O. Pinheiro and R. Collobert, “From image-level to pixel-level labeling with convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1713–1721.
  4. J. Ahn and S. Kwak, “Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4981–4990.
  5. T. Zhang, G. Lin, J. Cai, T. Shen, C. Shen, and A. C. Kot, “Decoupled spatial neural attention for weakly supervised semantic segmentation,” IEEE Transactions on Multimedia, vol. 21, no. 11, pp. 2930–2941, 2019.
  6. A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei, “What’s the point: Semantic segmentation with point supervision,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14.   Springer, 2016, pp. 549–565.
  7. D. Lin, J. Dai, J. Jia, K. He, and J. Sun, “Scribblesup: Scribble-supervised convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  8. P. Vernaza and M. Chandraker, “Learning random-walk label propagation for weakly-supervised semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7158–7166.
  9. J. Dai, K. He, and J. Sun, “Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1635–1643.
  10. J. Lee, J. Yi, C. Shin, and S. Yoon, “Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 2643–2652.
  11. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921–2929.
  12. P. Krähenbühl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” Advances in neural information processing systems, vol. 24, 2011.
  13. G. Sun, W. Wang, J. Dai, and L. Van Gool, “Mining cross-image semantics for weakly supervised semantic segmentation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16.   Springer, 2020, pp. 347–365.
  14. Z. Cheng, P. Qiao, K. Li, S. Li, P. Wei, X. Ji, L. Yuan, C. Liu, and J. Chen, “Out-of-candidate rectification for weakly supervised semantic segmentation,” arXiv preprint arXiv:2211.12268, 2022.
  15. L. Xu, W. Ouyang, M. Bennamoun, F. Boussaid, and D. Xu, “Multi-class token transformer for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4310–4319.
  16. J. Lee, J. Choi, J. Mok, and S. Yoon, “Reducing information bottleneck for weakly supervised semantic segmentation,” Advances in Neural Information Processing Systems, vol. 34, pp. 27 408–27 421, 2021.
  17. Y. Su, R. Sun, G. Lin, and Q. Wu, “Context decoupling augmentation for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 7004–7014.
  18. L. Zhou, C. Gong, Z. Liu, and K. Fu, “Sal: Selection and attention losses for weakly supervised semantic segmentation,” IEEE Transactions on Multimedia, vol. 23, pp. 1035–1048, 2020.
  19. L. Ru, Y. Zhan, B. Yu, and B. Du, “Learning affinity from attention: end-to-end weakly-supervised semantic segmentation with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16 846–16 855.
  20. B. Zhang, J. Xiao, Y. Wei, K. Huang, S. Luo, and Y. Zhao, “End-to-end weakly supervised semantic segmentation with reliable region mining,” Pattern Recognition, vol. 128, p. 108663, 2022.
  21. X. Zhang, Z. Peng, P. Zhu, T. Zhang, C. Li, H. Zhou, and L. Jiao, “Adaptive affinity loss and erroneous pseudo-label refinement for weakly supervised semantic segmentation,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 5463–5472.
  22. Z. Yang, K. Fu, M. Duan, L. Qu, S. Wang, and Z. Song, “Separate and conquer: Decoupling co-occurrence via decomposition and representation for weakly supervised semantic segmentation,” arXiv preprint arXiv:2402.18467, 2024.
  23. R. Xu, C. Wang, S. Xu, W. Meng, and X. Zhang, “Wave-like class activation map with representation fusion for weakly-supervised semantic segmentation,” IEEE Transactions on Multimedia, 2023.
  24. L. Zhang, Y. Gao, Y. Xia, K. Lu, J. Shen, and R. Ji, “Representative discovery of structure cues for weakly-supervised image segmentation,” IEEE transactions on multimedia, vol. 16, no. 2, pp. 470–479, 2013.
  25. Y. Wei, J. Feng, X. Liang, M.-M. Cheng, Y. Zhao, and S. Yan, “Object region mining with adversarial erasing: A simple classification to semantic segmentation approach,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1568–1576.
  26. J. Lee, E. Kim, and S. Yoon, “Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4071–4080.
  27. J. Xie, X. Hou, K. Ye, and L. Shen, “Clims: cross language image matching for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4483–4492.
  28. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning.   PMLR, 2021, pp. 8748–8763.
  29. J. Lee, S. J. Oh, S. Yun, J. Choe, E. Kim, and S. Yoon, “Weakly supervised semantic segmentation using out-of-distribution data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16 897–16 906.
  30. C. Tzelepis, V. Mezaris, and I. Patras, “Linear maximum margin classifier for learning from uncertain data,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 2948–2962, 2017.
  31. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  32. J. Ahn, S. Cho, and S. Kwak, “Weakly supervised learning of instance segmentation with inter-pixel relations,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2209–2218.
  33. L. Xu, W. Ouyang, M. Bennamoun, F. Boussaid, F. Sohel, and D. Xu, “Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 6984–6993.
  34. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  35. C. Gong, D. Wang, M. Li, V. Chandra, and Q. Liu, “Vision transformers with patch diversification,” arXiv preprint arXiv:2104.12753, 2021.
  36. H. Shi, J. Gao, H. Xu, X. Liang, Z. Li, L. Kong, S. Lee, and J. T. Kwok, “Revisiting over-smoothing in bert from the perspective of graph,” arXiv preprint arXiv:2202.08625, 2022.
  37. Q. Chen, L. Yang, J.-H. Lai, and X. Xie, “Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4288–4298.
  38. Y. Wang, J. Zhang, M. Kan, S. Shan, and X. Chen, “Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 275–12 284.
  39. J. Xie, J. Xiang, J. Chen, X. Hou, X. Zhao, and L. Shen, “C2am: contrastive learning of class-agnostic activation map for weakly supervised object localization and semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 989–998.
  40. Y. Lin, M. Chen, W. Wang, B. Wu, K. Li, B. Lin, H. Liu, and X. He, “Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation,” arXiv preprint arXiv:2212.09506, 2022.
  41. D. Zhang, H. Zhang, J. Tang, X.-S. Hua, and Q. Sun, “Causal intervention for weakly-supervised semantic segmentation,” Advances in Neural Information Processing Systems, vol. 33, pp. 655–666, 2020.
  42. A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” Advances in neural information processing systems, vol. 30, 2017.
  43. Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning.   PMLR, 2016, pp. 1050–1059.
  44. W. J. Maddox, P. Izmailov, T. Garipov, D. P. Vetrov, and A. G. Wilson, “A simple baseline for bayesian uncertainty in deep learning,” Advances in neural information processing systems, vol. 32, 2019.
  45. T. Yu, D. Li, Y. Yang, T. M. Hospedales, and T. Xiang, “Robust person re-identification by modelling feature uncertainty,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 552–561.
  46. R. Neven, D. Neven, B. De Brabandere, M. Proesmans, and T. Goedemé, “Weakly-supervised semantic segmentation by learning label uncertainty,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1678–1686.
  47. Y. Li, Y. Duan, Z. Kuang, Y. Chen, W. Zhang, and X. Li, “Uncertainty estimation via response scaling for pseudo-mask noise mitigation in weakly-supervised semantic segmentation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 1447–1455.
  48. H. Guo, H. Wang, and Q. Ji, “Uncertainty-guided probabilistic transformer for complex action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 20 052–20 061.
  49. F. Yang, Q. Zhai, X. Li, R. Huang, A. Luo, H. Cheng, and D.-P. Fan, “Uncertainty-guided transformer reasoning for camouflaged object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 4146–4155.
  50. R. Sinkhorn, “A relationship between arbitrary positive matrices and doubly stochastic matrices,” The annals of mathematical statistics, vol. 35, no. 2, pp. 876–879, 1964.
  51. L. Ru, H. Zheng, Y. Zhan, and B. Du, “Token contrast for weakly-supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3093–3102.
  52. M. Tang, F. Perazzi, A. Djelouah, I. Ben Ayed, C. Schroers, and Y. Boykov, “On regularized losses for weakly-supervised cnn segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 507–522.
  53. L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
  54. Z. Wu, C. Shen, and A. Van Den Hengel, “Wider or deeper: Revisiting the resnet model for visual recognition,” Pattern Recognition, vol. 90, pp. 119–133, 2019.
  55. E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 12 077–12 090, 2021.
  56. P.-T. Jiang, Q. Hou, Y. Cao, M.-M. Cheng, Y. Wei, and H.-K. Xiong, “Integral object mining via online attention accumulation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  57. J. Fan, Z. Zhang, C. Song, and T. Tan, “Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  58. S. Lee, M. Lee, J. Lee, and H. Shim, “Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 5495–5505.
  59. P.-T. Jiang, Y. Yang, Q. Hou, and Y. Wei, “L2g: A simple local-to-global knowledge transfer framework for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 16 886–16 896.
  60. Y.-T. Chang, Q. Wang, W.-C. Hung, R. Piramuthu, Y.-H. Tsai, and M.-H. Yang, “Weakly-supervised semantic segmentation via sub-category exploration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8991–9000.
  61. Z. Chen, T. Wang, X. Wu, X.-S. Hua, H. Zhang, and Q. Sun, “Class re-activation maps for weakly-supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 969–978.
  62. Q. Chen, L. Yang, J.-H. Lai, and X. Xie, “Self-supervised image-specific prototype exploration for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 4288–4298.
  63. L. Chen, C. Lei, R. Li, S. Li, Z. Zhang, and L. Zhang, “Fpr: False positive rectification for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 1108–1118.
  64. N. Araslanov and S. Roth, “Single-stage semantic segmentation from image labels,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4253–4262.
  65. R. Xu, C. Wang, J. Sun, S. Xu, W. Meng, and X. Zhang, “Self correspondence distillation for end-to-end weakly-supervised semantic segmentation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 3, 2023, pp. 3045–3053.
  66. S. Rossetti, D. Zappia, M. Sanzari, M. Schaerf, and F. Pirri, “Max pooling with vision transformers reconciles class and shape in weakly supervised semantic segmentation,” in European Conference on Computer Vision.   Springer, 2022, pp. 446–463.
  67. J. Xie, X. Hou, K. Ye, and L. Shen, “Clims: Cross language image matching for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4483–4492.
  68. J. Lee, S. J. Oh, S. Yun, J. Choe, E. Kim, and S. Yoon, “Weakly supervised semantic segmentation using out-of-distribution data,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 897–16 906.
  69. M. Lee, D. Kim, and H. Shim, “Threshold matters in wsss: manipulating the activation for the robust and accurate segmentation model against thresholds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4330–4339.
  70. Y. Lin, M. Chen, W. Wang, B. Wu, K. Li, B. Lin, H. Liu, and X. He, “Clip is also an efficient segmenter: A text-driven approach for weakly supervised semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15 305–15 314.
  71. M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: A retrospective,” International journal of computer vision, vol. 111, pp. 98–136, 2015.
  72. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13.   Springer, 2014, pp. 740–755.
  73. O. Bernard, A. Lalande, C. Zotti, F. Cervenansky, X. Yang, P.-A. Heng, I. Cetin, K. Lekadir, O. Camara, M. A. G. Ballester et al., “Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: is the problem solved?” IEEE transactions on medical imaging, vol. 37, no. 11, pp. 2514–2525, 2018.
  74. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” arXiv preprint arXiv:1412.7062, 2014.
  75. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition.   Ieee, 2009, pp. 248–255.
  76. J. Li, Z. Jie, X. Wang, X. Wei, and L. Ma, “Expansion and shrinkage of localization for weakly-supervised semantic segmentation,” Advances in Neural Information Processing Systems, vol. 35, pp. 16 037–16 051, 2022.
  77. J. Pan, P. Zhu, K. Zhang, B. Cao, Y. Wang, D. Zhang, J. Han, and Q. Hu, “Learning self-supervised low-rank network for single-stage weakly and semi-supervised semantic segmentation,” International Journal of Computer Vision, vol. 130, no. 5, pp. 1181–1195, 2022.
  78. Z. Chen, Z. Tian, Y. Zheng, X. Si, X. Qin, Z. Shi, and S. Zheng, “Image-level supervised segmentation for human organs with confidence cues,” Physics in Medicine & Biology, vol. 66, no. 6, p. 065018, 2021.
  79. H. Kervadec, J. Dolz, M. Tang, E. Granger, Y. Boykov, and I. B. Ayed, “Constrained-cnn losses for weakly supervised segmentation,” Medical image analysis, vol. 54, pp. 88–99, 2019.
  80. L. Chen, W. Wu, C. Fu, X. Han, and Y. Zhang, “Weakly supervised semantic segmentation with boundary exploration,” in European Conference on Computer Vision.   Springer, 2020, pp. 347–362.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Zhiwei Yang (43 papers)
  2. Yucong Meng (8 papers)
  3. Kexue Fu (23 papers)
  4. Shuo Wang (382 papers)
  5. Zhijian Song (31 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com