Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation (2403.01156v1)
Abstract: Most existing weakly supervised semantic segmentation (WSSS) methods rely on Class Activation Mapping (CAM) to extract coarse class-specific localization maps using image-level labels. Prior works have commonly used an off-line heuristic thresholding process that combines the CAM maps with off-the-shelf saliency maps produced by a general pre-trained saliency model to produce more accurate pseudo-segmentation labels. We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from these saliency maps and the significant inter-task correlation between saliency detection and semantic segmentation. In the proposed AuxSegNet+, saliency detection and multi-label image classification are used as auxiliary tasks to improve the primary task of semantic segmentation with only image-level ground-truth labels. We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps. In particular, we propose a cross-task dual-affinity learning module to learn both pairwise and unary affinities, which are used to enhance the task-specific features and predictions by aggregating both query-dependent and query-independent global context for both saliency detection and semantic segmentation. The learned cross-task pairwise affinity can also be used to refine and propagate CAM maps to provide better pseudo labels for both tasks. Iterative improvement of segmentation performance is enabled by cross-task affinity learning and pseudo-label updating. Extensive experiments demonstrate the effectiveness of the proposed approach with new state-of-the-art WSSS results on the challenging PASCAL VOC and MS COCO benchmarks.
- L. Xu, W. Ouyang, M. Bennamoun, F. Boussaid, F. Sohel, and D. Xu, “Leveraging auxiliary tasks with affinity learning for weakly supervised semantic segmentation,” in Int. Conf. Comput. Vis., 2021, pp. 6984–6993.
- G. Wang, G. Wang, X. Zhang, J. Lai, Z. Yu, and L. Lin, “Weakly supervised person re-id: Differentiable graphical learning and a new benchmark,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 5, pp. 2142–2156, 2020.
- X.-Y. Zhang, C. Li, H. Shi, X. Zhu, P. Li, and J. Dong, “Adapnet: Adaptability decomposing encoder-decoder network for weakly supervised action recognition and localization,” IEEE Trans. Neural Netw. Learn. Syst., early access, 23 Jan. 2020, doi:10.1109/TNNLS.2019.2962815.
- Y. Yao, F. Wan, W. Gao, X. Pan, Z. Peng, Q. Tian, and Q. Ye, “Ts-cam: Token semantic coupled attention map for weakly supervised object localization,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–13, early access, 23 Nov. 2022, doi:10.1109/TNNLS.2022.3218471.
- D. Zhang, G. Guo, W. Zeng, L. Li, and J. Han, “Generalized weakly supervised object localization,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–12, early access, 21 Sept. 2022, doi:10.1109/TNNLS.2022.3204337.
- Y. Shen, R. Ji, C. Wang, X. Li, and X. Li, “Weakly supervised object detection via object-specific pixel gradient,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 12, pp. 5960–5970, 2018.
- D. Zhang, J. Han, L. Zhao, and T. Zhao, “From discriminant to complete: Reinforcement searching-agent learning for weakly supervised object detection,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 12, pp. 5549–5560, 2020.
- Z. Wu, J. Wen, Y. Xu, J. Yang, X. Li, and D. Zhang, “Enhanced spatial feature learning for weakly supervised object detection,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–12, early access, 08 Jun. 2022, 10.1109/TNNLS.2022.3178180.
- R. Hu, P. Dollár, K. He, T. Darrell, and R. Girshick, “Learning to segment every thing,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 4233–4241.
- C. Song, Y. Huang, W. Ouyang, and L. Wang, “Box-driven class-wise region masking and filling rate guided loss for weakly supervised semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 3136–3145.
- D. Lin, J. Dai, J. Jia, K. He, and J. Sun, “Scribblesup: Scribble-supervised convolutional networks for semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 3159–3167.
- M. Tang, A. Djelouah, F. Perazzi, Y. Boykov, and C. Schroers, “Normalized cut loss for weakly-supervised cnn segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 1818–1827.
- D. Pathak, P. Krahenbuhl, and T. Darrell, “Constrained convolutional neural networks for weakly supervised segmentation,” in Int. Conf. Comput. Vis., 2015, pp. 1796–1804.
- A. Kolesnikov and C. H. Lampert, “Seed, expand and constrain: Three principles for weakly-supervised image segmentation,” in Eur. Conf. Comput. Vis., 2016, pp. 695–711.
- L. Xu, H. Xue, M. Bennamoun, F. Boussaid, and F. Sohel, “Atrous convolutional feature network for weakly supervised semantic segmentation,” Neurocomputing, vol. 421, pp. 115–126, 2021.
- Z. Zhang, Q. Peng, S. Fu, W. Wang, Y.-M. Cheung, Y. Zhao, S. Yu, and X. You, “A componentwise approach to weakly supervised semantic segmentation using dual-feedback network,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–14, early access, 04 Feb. 2022 2022, doi:10.1109/TNNLS.2022.3144194.
- B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 2921–2929.
- Y. Wei, J. Feng, X. Liang, M.-M. Cheng, Y. Zhao, and S. Yan, “Object region mining with adversarial erasing: A simple classification to semantic segmentation approach,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 1568–1576.
- Y. Wei, H. Xiao, H. Shi, Z. Jie, J. Feng, and T. S. Huang, “Revisiting dilated convolution: A simple approach for weakly-and semi-supervised semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 7268–7277.
- P.-T. Jiang, Q. Hou, Y. Cao, M.-M. Cheng, Y. Wei, and H.-K. Xiong, “Integral object mining via online attention accumulation,” in Int. Conf. Comput. Vis., 2019, pp. 2070–2079.
- Y. Wang, J. Zhang, M. Kan, S. Shan, and X. Chen, “Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 12 275–12 284.
- A. Chaudhry, P. K. Dokania, and P. H. Torr, “Discovering class-specific pixels for weakly-supervised semantic segmentation,” in Brit. Mach. Vis. Conf., 2017, pp. 1–17.
- Q. Hou, P. Jiang, Y. Wei, and M.-M. Cheng, “Self-erasing network for integral object attention,” in Adv. Neural Inform. Process. Syst., 2018, pp. 547–557.
- G. Sun, W. Wang, J. Dai, and L. Van Gool, “Mining cross-image semantics for weakly supervised semantic segmentation,” in Eur. Conf. Comput. Vis., 2020, pp. 347–365.
- T. Zhang, G. Lin, W. Liu, J. Cai, and A. Kot, “Splitting vs. merging: Mining object regions with discrepancy and intersection loss for weakly supervised semantic segmentation,” in Eur. Conf. Comput. Vis., 2020, pp. 663–679.
- S. Jiang, J. Li, Y. Wang, W. Wu, J. Zhang, B. Huang, and T. Xu, “Metaseg: Content-aware meta-net for omni-supervised semantic segmentation,” IEEE Trans. Neural Networks Learn. Syst. (Early Access), pp. 1–13, 2023, doi: 10.1109/TNNLS.2023.3263335.
- M. Tang, F. Perazzi, A. Djelouah, I. Ben Ayed, C. Schroers, and Y. Boykov, “On regularized losses for weakly-supervised cnn segmentation,” in Eur. Conf. Comput. Vis., 2018, pp. 507–522.
- T. Ke, J. Hwang, and S. X. Yu, “Universal weakly supervised segmentation by pixel-to-segment contrastive learning,” in Int. Conf. Learn. Represent., 2021.
- K. Li, Z. Wu, K.-C. Peng, J. Ernst, and Y. Fu, “Tell me where to look: Guided attention inference network,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 9215–9223.
- H. Kweon, S.-H. Yoon, H. Kim, D. Park, and K.-J. Yoon, “Unlocking the potential of ordinary classifier: Class-specific adversarial erasing framework for weakly supervised semantic segmentation,” in Int. Conf. Comput. Vis., 2021, pp. 6994–7003.
- J. Lee, E. Kim, and S. Yoon, “Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 4071–4080.
- D. Zhang, H. Zhang, J. Tang, X. Hua, and Q. Sun, “Causal intervention for weakly-supervised semantic segmentation,” in Adv. Neural Inform. Process. Syst., vol. 33, 2020, pp. 655–666.
- Y. Su, R. Sun, G. Lin, and Q. Wu, “Context decoupling augmentation for weakly supervised semantic segmentation,” in Int. Conf. Comput. Vis., 2021, pp. 7004–7014.
- L. Xu, M. Bennamoun, F. Boussaid, and F. Sohel, “Scale-aware feature network for weakly supervised semantic segmentation,” IEEE Access, vol. 8, pp. 75 957–75 967, 2020.
- Y. Yao, T. Chen, G.-S. Xie, C. Zhang, F. Shen, Q. Wu, Z. Tang, and J. Zhang, “Non-salient region object mining for weakly supervised semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 2623–2632.
- X. Li, T. Zhou, J. Li, Y. Zhou, and Z. Zhang, “Group-wise semantic mining for weakly supervised semantic segmentation,” in Proc. of AAAI Conference on Artificial Intelligence, vol. 35, no. 3, 2021, pp. 1984–1992.
- C. Wang, D. Zhang, L. Zhang, and J. Tang, “Coupling global context and local contents for weakly-supervised semantic segmentation,” IEEE Trans. Neural Netw. Learn. Syst., 2023.
- Y.-T. Chang, Q. Wang, W.-C. Hung, R. Piramuthu, Y.-H. Tsai, and M.-H. Yang, “Weakly-supervised semantic segmentation via sub-category exploration,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 8991–9000.
- F. Zhang, C. Gu, C. Zhang, and Y. Dai, “Complementary patch for weakly supervised semantic segmentation,” in Int. Conf. Comput. Vis., 2021, pp. 7242–7251.
- Z. Huang, X. Wang, J. Wang, W. Liu, and J. Wang, “Weakly-supervised semantic segmentation network with deep seeded region growing,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 7014–7023.
- X. Wang, S. You, X. Li, and H. Ma, “Weakly-supervised semantic segmentation by iteratively mining common object features,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 1354–1362.
- J. Ahn and S. Kwak, “Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 4981–4990.
- J. Fan, Z. Zhang, T. Tan, C. Song, and J. Xiao, “Cian: Cross-image affinity net for weakly supervised semantic segmentation,” in Proc. of AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 10 762–10 769.
- D. Xu, W. Ouyang, X. Wang, and N. Sebe, “Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 675–684.
- L. Sheng, D. Xu, W. Ouyang, and X. Wang, “Unsupervised collaborative learning of keyframe detection and visual odometry towards monocular deep slam,” in Int. Conf. Comput. Vis., 2019, pp. 4302–4311.
- D. Xu, A. Vedaldi, and J. F. Henriques, “Moving slam: Fully unsupervised deep learning in non-rigid scenes,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2021, pp. 4611–4617.
- S. Liu, A. Davison, and E. Johns, “Self-supervised generalisation with meta auxiliary learning,” in Adv. Neural Inform. Process. Syst., 2019, pp. 1677–1687.
- J. Dai, K. He, and J. Sun, “Instance-aware semantic segmentation via multi-task network cascades,” in IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 3150–3158.
- H. Chen, X. Qi, L. Yu, and P.-A. Heng, “Dcan: deep contour-aware networks for accurate gland segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 2487–2496.
- Y. Shen, R. Ji, Y. Wang, Y. Wu, and L. Cao, “Cyclic guidance for weakly supervised joint detection and segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 697–707.
- J. Hwang, S. Kim, J. Son, and B. Han, “Weakly supervised instance segmentation by deep community learning,” in IEEE Wint. Conf. App. Comput. Vis., 2021, pp. 1020–1029.
- B. Zhang, J. Xiao, Y. Wei, M. Sun, and K. Huang, “Reliability does matter: An end-to-end weakly supervised semantic segmentation approach,” in Proc. of AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 12 765–12 772.
- N. Araslanov and S. Roth, “Single-stage semantic segmentation from image labels,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 4253–4262.
- Y. Zeng, Y. Zhuge, H. Lu, and L. Zhang, “Joint learning of saliency detection and weakly supervised semantic segmentation,” in Int. Conf. Comput. Vis., 2019, pp. 7223–7233.
- S. Lee, M. Lee, J. Lee, and H. Shim, “Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 5495–5505.
- Q. Hou, M. Cheng, X. Hu, A. Borji, Z. Tu, and P. Torr, “Deeply supervised salient object detection with short connections.” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 4, pp. 815–828, 2019.
- J. Lee, E. Kim, S. Lee, J. Lee, and S. Yoon, “Ficklenet: Weakly and semi-supervised segmentation using stochastic inference,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 5267–5276.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Adv. Neural Inform. Process. Syst., 2017, pp. 5998–6008.
- Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu, “Gcnet: Non-local networks meet squeeze-excitation networks and beyond,” in Int. Conf. Comput. Vis. Worksh., 2019, pp. 1–10.
- M. Yin, Z. Yao, Y. Cao, X. Li, Z. Zhang, S. Lin, and H. Hu, “Disentangled non-local neural networks,” in Eur. Conf. Comput. Vis., 2020, pp. 191–207.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 7132–7141.
- S. Woo, J. Park, J.-Y. Lee, and I. So Kweon, “Cbam: Convolutional block attention module,” in Eur. Conf. Comput. Vis., 2018, pp. 3–19.
- S. Zhang, J. Yang, and B. Schiele, “Occluded pedestrian detection through guided attention in cnns,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 6995–7003.
- H. Zheng, J. Fu, T. Mei, and J. Luo, “Learning multi-attention convolutional neural network for fine-grained image recognition,” in Int. Conf. Comput. Vis., 2017, pp. 5209–5217.
- M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” Int. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, 2010.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Eur. Conf. Comput. Vis., 2014, pp. 740–755.
- L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” in Int. Conf. Learn. Represent., 2015.
- B. Hariharan, P. Arbeláez, L. Bourdev, S. Maji, and J. Malik, “Semantic contours from inverse detectors,” in Int. Conf. Comput. Vis., 2011, pp. 991–998.
- W. Luo and M. Yang, “Learning saliency-free model with generic features for weakly-supervised semantic segmentation.” in Proc. of AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 11 717–11 724.
- K. Sun, H. Shi, Z. Zhang, and Y. Huang, “Ecs-net: Improving weakly supervised semantic segmentation by using connections between class activation maps,” in Int. Conf. Comput. Vis., 2021, pp. 7283–7292.
- Y. Li, Z. Kuang, L. Liu, Y. Chen, and W. Zhang, “Pseudo-mask matters in weakly-supervised semantic segmentation,” in Int. Conf. Comput. Vis., 2021, pp. 6964–6973.
- J. Qin, J. Wu, X. Xiao, L. Li, and X. Wang, “Activation modulation and recalibration scheme for weakly supervised semantic segmentation,” in Proc. of AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 2117–2125.
- J. Fan, Z. Zhang, C. Song, and T. Tan, “Learning integral objects with intra-class discriminator for weakly-supervised semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 4283–4292.
- T. Wu, J. Huang, G. Gao, X. Wei, X. Wei, X. Luo, and C. H. Liu, “Embedded discriminative attention mechanism for weakly supervised semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021, pp. 16 765–16 774.
- Z. Wu, C. Shen, and A. Van Den Hengel, “Wider or deeper: Revisiting the resnet model for visual recognition,” Pattern Recognition, vol. 90, pp. 119–133, 2019.
- J. Zhang, X. Yu, A. Li, P. Song, B. Liu, and Y. Dai, “Weakly-supervised salient object detection via scribble annotations,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 12 546–12 555.
- J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, and J. Jiang, “A simple pooling-based design for real-time salient object detection,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 3917–3926.
- Y. Pang, X. Zhao, L. Zhang, and H. Lu, “Multi-scale interactive network for salient object detection,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 9413–9422.
- J. Ahn, S. Cho, and S. Kwak, “Weakly supervised learning of instance segmentation with inter-pixel relations,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019, pp. 2209–2218.
- P. Wang and X. Bai, “Thermal infrared pedestrian segmentation based on conditional gan,” IEEE Trans. Image Process., vol. 28, no. 12, pp. 6007–6021, 2019.
- X. Bai, P. Wang, and F. Zhou, “Pedestrian segmentation in infrared images based on circular shortest path,” IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 8, pp. 2214–2222, 2016.
- Y.-W. Chao, Y. Liu, X. Liu, H. Zeng, and J. Deng, “Learning to detect human-object interactions,” in IEEE Wint. Conf. App. Comput. Vis. IEEE, 2018, pp. 381–389.
- Lian Xu (6 papers)
- Mohammed Bennamoun (124 papers)
- Farid Boussaid (30 papers)
- Wanli Ouyang (358 papers)
- Ferdous Sohel (35 papers)
- Dan Xu (120 papers)