Semi-supervised Salient Object Detection with Effective Confidence Estimation (2112.14019v2)
Abstract: The success of existing salient object detection models relies on a large pixel-wise labeled training dataset, which is time-consuming and expensive to obtain. We study semi-supervised salient object detection, with access to a small number of labeled samples and a large number of unlabeled samples. Specifically, we present a pseudo label based learn-ing framework with a Conditional Energy-based Model. We model the stochastic nature of human saliency labels using the stochastic latent variable of the Conditional Energy-based Model. It further enables generation of a high-quality pixel-wise uncertainty map, highlighting the reliability of corresponding pseudo label generated for the unlabeled sample. This minimises the contribution of low-certainty pseudo labels in optimising the model, preventing the error propagation. Experimental results show that the proposed strategy can effectively explore the contribution of unlabeled data. With only 1/16 labeled samples, our model achieves competitive performance compared with state-of-the-art fully-supervised models.
- A. Borji, M.-M. Cheng, H. Jiang, and J. Li, “Salient object detection: A benchmark,” IEEE Transactions on Image Processing (TIP), vol. 24, no. 12, pp. 5706–5722, 2015.
- H. Mei, Y. Liu, Z. Wei, D. Zhou, X. Wei, Q. Zhang, and X. Yang, “Exploring dense context for salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1378–1389, 2021.
- X. Hu, C.-W. Fu, L. Zhu, T. Wang, and P.-A. Heng, “Sac-net: Spatial attenuation context for salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 3, pp. 1079–1090, 2020.
- L. Sun, Z. Chen, Q. J. Wu, H. Zhao, W. He, and X. Yan, “Ampnet: Average-and max-pool networks for salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 11, pp. 4321–4333, 2021.
- L. Zhang, J. Dai, H. Lu, Y. He, and G. Wang, “A bi-directional message passing model for salient object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1741–1750.
- M. Feng, H. Lu, and E. Ding, “Attentive feedback network for boundary-aware salient object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1623–1632.
- W. Wang, S. Zhao, J. Shen, S. C. Hoi, and A. Borji, “Salient object detection with pyramid attention and salient edges,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1448–1457.
- L. Wang, H. Lu, Y. Wang, M. Feng, D. Wang, B. Yin, and X. Ruan, “Learning to detect salient objects with image-level supervision,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 136–145.
- D. Zhang, J. Han, and Y. Zhang, “Supervision by fusion: Towards unsupervised learning of deep salient object detector,” in IEEE International Conference on Computer Vision (ICCV), 2017, pp. 4048–4056.
- T. Wang, L. Zhang, S. Wang, H. Lu, G. Yang, X. Ruan, and A. Borji, “Detect globally, refine locally: A novel approach to saliency detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3127–3135.
- R. Wu, M. Feng, W. Guan, D. Wang, H. Lu, and E. Ding, “A mutual learning method for salient object detection with intertwined multi-supervision,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8150–8159.
- J. Zhang, D.-P. Fan, Y. Dai, S. Anwar, F. S. Saleh, T. Zhang, and N. Barnes, “Uc-net: Uncertainty inspired rgb-d saliency detection via conditional variational autoencoders,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8582–8591.
- G. Li, Y. Xie, and L. Lin, “Weakly supervised salient object detection using image labels,” in AAAI Conference on Artificial Intelligence (AAAI), 2018.
- J. Zhang, X. Yu, A. Li, P. Song, B. Liu, and Y. Dai, “Weakly-supervised salient object detection via scribble annotations,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 12 546–12 555.
- D. Zhang, H. Tian, and J. Han, “Few-cost salient object detection with adversarial-paced learning,” in Conference on Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 12 236–12 247.
- Y. Lv, B. Liu, J. Zhang, Y. Dai, A. Li, and T. Zhang, “Semi-supervised active salient object detection,” Pattern Recognition (PR), vol. 123, p. 108364, 2021.
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Conference on Neural Information Processing Systems (NeurIPS), vol. 27, 2014.
- X. Zhai, A. Oliver, A. Kolesnikov, and L. Beyer, “S4l: Self-supervised semi-supervised learning,” in IEEE International Conference on Computer Vision (ICCV), 2019, pp. 1476–1485.
- Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le, “Self-training with noisy student improves imagenet classification,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10 687–10 698.
- X. Wang, D. Kihara, J. Luo, and G.-J. Qi, “Enaet: A self-trained framework for semi-supervised and supervised learning with ensemble transformations,” IEEE Transactions on Image Processing (TIP), vol. 30, pp. 1639–1647, 2021.
- D. Li, J. Yang, K. Kreis, A. Torralba, and S. Fidler, “Semantic segmentation with generative models: Semi-supervised learning and strong out-of-domain generalization,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8300–8311.
- R. Mendel, L. A. De Souza, D. Rauber, J. P. Papa, and C. Palm, “Semi-supervised segmentation based on error-correcting supervision,” in European Conference on Computer Vision (ECCV), 2020, pp. 141–157.
- Y. Ouali, C. Hudelot, and M. Tami, “Semi-supervised semantic segmentation with cross-consistency training,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 12 674–12 684.
- X. Lai, Z. Tian, L. Jiang, S. Liu, H. Zhao, L. Wang, and J. Jia, “Semi-supervised semantic segmentation with directional context-aware consistency,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 1205–1214.
- X. Chen, Y. Yuan, G. Zeng, and J. Wang, “Semi-supervised semantic segmentation with cross pseudo supervision,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2613–2622.
- X. Huo, L. Xie, J. He, Z. Yang, W. Zhou, H. Li, and Q. Tian, “Atso: Asynchronous teacher-student optimization for semi-supervised image segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 1235–1244.
- C. Wang, S. Dong, X. Zhao, G. Papanastasiou, H. Zhang, and G. Yang, “Saliencygan: Deep learning semisupervised salient object detection in the fog of iot,” IEEE Transactions on Industrial Informatics, vol. 16, no. 4, pp. 2667–2676, 2019.
- J. Liu, J. Zhang, and N. Barnes, “Modeling aleatoric uncertainty for camouflaged object detection,” in IEEE Winter Conference on Applications of Computer Vision (WACV), January 2022, pp. 1445–1454.
- L. Wang, H. Lu, X. Ruan, and M.-H. Yang, “Deep networks for saliency detection via local estimation and global search,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3183–3192.
- G. Li and Y. Yu, “Visual saliency based on multiscale deep features,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 5455–5463.
- R. Zhao, W. Ouyang, H. Li, and X. Wang, “Saliency detection by multi-context deep learning,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1265–1274.
- W. Zhou, Q. Guo, J. Lei, L. Yu, and J.-N. Hwang, “Ecffnet: Effective and consistent feature fusion network for rgb-t salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1224–1235, 2021.
- A. Li, J. Zhang, Y. Lv, B. Liu, T. Zhang, and Y. Dai, “Uncertainty-aware joint salient object and camouflaged object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 10 071–10 081.
- P. Zhang, D. Wang, H. Lu, H. Wang, and X. Ruan, “Amulet: Aggregating multi-level convolutional features for salient object detection,” in IEEE International Conference on Computer Vision (ICCV), 2017, pp. 202–211.
- Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. H. Torr, “Deeply supervised salient object detection with short connections,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3203–3212.
- X. Hu, L. Zhu, J. Qin, C.-W. Fu, and P.-A. Heng, “Recurrently aggregating deep features for salient object detection,” in AAAI Conference on Artificial Intelligence (AAAI), 2018.
- M. A. Islam, M. Kalash, and N. D. Bruce, “Revisiting salient object detection: Simultaneous detection, ranking, and subitizing of multiple salient objects,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7142–7150.
- X. Wang, T. Sun, R. Yang, C. Li, B. Luo, and J. Tang, “Quality-aware dual-modal saliency detection via deep reinforcement learning,” Signal Processing: Image Communication, vol. 75, pp. 158–167, 2019.
- B. Jiang, Z. Zhou, X. Wang, J. Tang, and B. Luo, “Cmsalgan: Rgb-d salient object detection with cross-view generative adversarial networks,” IEEE Transactions on Multimedia (TMM), vol. 23, pp. 1343–1353, 2020.
- W. Wang, J. Shen, X. Dong, and A. Borji, “Salient object detection driven by fixation prediction,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1711–1720.
- S. S. Kruthiventi, V. Gudisa, J. H. Dholakiya, and R. V. Babu, “Saliency unified: A deep architecture for simultaneous eye fixation prediction and salient object segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5781–5790.
- N. Liu, J. Han, and M.-H. Yang, “Picanet: Learning pixel-wise contextual attention for saliency detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3089–3098.
- T. Zhao and X. Wu, “Pyramid feature attention network for saliency detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3085–3094.
- J. Zhang, J. Xie, N. Barnes, and P. Li, “Learning generative vision transformer with energy-based latent space for saliency prediction,” in Conference on Neural Information Processing Systems (NeurIPS), vol. 34, 2021.
- Y. Mao, J. Zhang, Z. Wan, Y. Dai, A. Li, Y. Lv, X. Tian, D.-P. Fan, and N. Barnes, “Transformer transforms salient object detection and camouflaged object detection,” arXiv preprint arXiv:2104.10127, 2021.
- N. Liu, N. Zhang, K. Wan, L. Shao, and J. Han, “Visual saliency transformer,” in IEEE International Conference on Computer Vision (ICCV), 2021, pp. 4722–4732.
- Z. Liu, Y. Tan, Q. He, and Y. Xiao, “Swinnet: Swin transformer drives edge-aware rgb-d and rgb-t salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4486–4497, 2021.
- A. Rasmus, M. Berglund, M. Honkala, H. Valpola, and T. Raiko, “Semi-supervised learning with ladder networks,” in Conference on Neural Information Processing Systems (NeurIPS), vol. 28, 2015, pp. 3546–3554.
- M. Sajjadi, M. Javanmardi, and T. Tasdizen, “Regularization with stochastic transformations and perturbations for deep semi-supervised learning,” in Conference on Neural Information Processing Systems (NeurIPS), vol. 29, 2016, pp. 1163–1171.
- S. Laine and T. Aila, “Temporal ensembling for semi-supervised learning,” arXiv preprint arXiv:1610.02242, 2016.
- A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Conference on Neural Information Processing Systems (NeurIPS), 2017, pp. 1195–1204.
- T. Miyato, S.-i. Maeda, M. Koyama, and S. Ishii, “Virtual adversarial training: a regularization method for supervised and semi-supervised learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 41, no. 8, pp. 1979–1993, 2018.
- Q. Xie, Z. Dai, E. Hovy, T. Luong, and Q. Le, “Unsupervised data augmentation for consistency training,” in Conference on Neural Information Processing Systems (NeurIPS), vol. 33, 2020.
- S. Mittal, M. Tatarchenko, and T. Brox, “Semi-supervised semantic segmentation with high-and low-level consistency,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 43, no. 4, pp. 1369–1379, 2021.
- A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proceedings of the eleventh annual conference on Computational learning theory, 1998, pp. 92–100.
- D.-H. Lee et al., “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in International Conference on Machine Learning (ICML) Workshop, 2013, p. 896.
- K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” in Conference on Neural Information Processing Systems (NeurIPS), vol. 28, 2015, pp. 3483–3491.
- Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, and F. Huang, “A tutorial on energy-based learning,” Predicting structured data, vol. 1, no. 0, 2006.
- D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in International Conference on Machine Learning (ICML), 2015, pp. 1530–1538.
- J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Conference on Neural Information Processing Systems (NeurIPS), 2020, pp. 6840–6851.
- T. Han, Y. Lu, S.-C. Zhu, and Y. N. Wu, “Alternating back-propagation for generator network,” in AAAI Conference on Artificial Intelligence (AAAI), 2017.
- B. Li, Z. Sun, and Y. Guo, “Supervae: Superpixelwise variational autoencoder for salient object detection,” in AAAI Conference on Artificial Intelligence (AAAI), 2019, pp. 8569–8576.
- N. Souly, C. Spampinato, and M. Shah, “Semi supervised semantic segmentation using generative adversarial network,” in IEEE International Conference on Computer Vision (ICCV), 2017, pp. 5689–5697.
- D. Li, J. Yang, K. Kreis, A. Torralba, and S. Fidler, “Semantic segmentation with generative models: Semi-supervised learning and strong out-of-domain generalization,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- H. Ling, D. Acuna, K. Kreis, S. W. Kim, and S. Fidler, “Variational amodal object completion,” in Conference on Neural Information Processing Systems (NeurIPS), 2020, pp. 16 246–16 257.
- A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” in Conference on Neural Information Processing Systems (NeurIPS), vol. 30, 2017, pp. 5574–5584.
- C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in International Conference on Machine Learning (ICML), 2017, pp. 1321–1330.
- N. Durasov, T. Bagautdinov, P. Baque, and P. Fua, “Masksembles for Uncertainty Estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” in Conference on Neural Information Processing Systems (NeurIPS), vol. 30, 2017.
- G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger, “Snapshot ensembles: Train 1, get M for free,” in International Conference on Learning Representations (ICLR), 2017.
- Y. Wen, D. Tran, and J. Ba, “Batchensemble: an alternative approach to efficient ensemble and lifelong learning,” in International Conference on Learning Representations (ICLR), 2020.
- D. Kwon and S. Kwak, “Semi-supervised semantic segmentation with error localization network,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 9957–9967.
- J. Liu, J. Zhang, K. Zhang, and N. Barnes, “Generalised co-salient object detection,” arXiv preprint arXiv:2208.09668, 2022.
- B. Pang, T. Han, E. Nijkamp, S.-C. Zhu, and Y. N. Wu, “Learning latent space energy-based prior model,” in Conference on Neural Information Processing Systems (NeurIPS), vol. 33, 2020.
- E. Nijkamp, M. Hill, S.-C. Zhu, and Y. N. Wu, “Learning non-convergent non-persistent short-run mcmc toward energy-based model,” in Conference on Neural Information Processing Systems (NeurIPS), 2019, pp. 5232–5242.
- E. Nijkamp, B. Pang, T. Han, L. Zhou, S.-C. Zhu, and Y. N. Wu, “Learning multi-layer latent variable model via variational optimization of short run mcmc for approximate inference,” in European Conference on Computer Vision (ECCV). Springer, 2020, pp. 361–378.
- S. C. Zhu and D. Mumford, “Grade: Gibbs reaction and diffusion equations,” in IEEE International Conference on Computer Vision (ICCV), 1998, pp. 847–854.
- R. M. Neal et al., “Mcmc using hamiltonian dynamics,” Handbook of markov chain monte carlo, vol. 2, no. 11, p. 2, 2011.
- C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. Jorge Cardoso, “Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Québec City, QC, Canada, September 14, Proceedings 3. Springer, 2017, pp. 240–248.
- R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
- D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
- Y. Zou, Z. Zhang, H. Zhang, C.-L. Li, X. Bian, J.-B. Huang, and T. Pfister, “Pseudoseg: Designing pseudo labels for semantic segmentation,” in International Conference on Learning Representations (ICLR), 2020.
- H. Hu, F. Wei, H. Hu, Q. Ye, J. Cui, and L. Wang, “Semi-supervised semantic segmentation via adaptive equalization learning,” Conference on Neural Information Processing Systems (NeurIPS), vol. 34, 2021.
- Y. Zhou, H. Xu, W. Zhang, B. Gao, and P.-A. Heng, “C3-semiseg: Contrastive semi-supervised segmentation via cross-set learning and dynamic class-balancing,” in IEEE International Conference on Computer Vision (ICCV), 2021, pp. 7036–7045.
- C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang, “Saliency detection via graph-based manifold ranking,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 3166–3173.
- Q. Yan, L. Xu, J. Shi, and J. Jia, “Hierarchical saliency detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 1155–1162.
- D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in IEEE International Conference on Computer Vision (ICCV), vol. 2, 2001, pp. 416–423.
- J. Shi, Q. Yan, L. Xu, and J. Jia, “Hierarchical image saliency detection on extended cssd,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 38, no. 4, pp. 717–729, 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- Z. Wu, L. Su, and Q. Huang, “Cascaded partial decoder for fast and accurate salient object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3907–3916.
- X. Qin, Z. Zhang, C. Huang, C. Gao, M. Dehghan, and M. Jagersand, “Basnet: Boundary-aware salient object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7479–7489.
- J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, and J. Jiang, “A simple pooling-based design for real-time salient object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3917–3926.
- J.-X. Zhao, J.-J. Liu, D.-P. Fan, Y. Cao, J. Yang, and M.-M. Cheng, “Egnet: Edge guidance network for salient object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8779–8788.
- Y. Pang, X. Zhao, L. Zhang, and H. Lu, “Multi-scale interactive network for salient object detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9413–9422.
- H. Zhou, X. Xie, J.-H. Lai, Z. Chen, and L. Yang, “Interactive two-stream decoder for accurate and fast saliency detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9141–9150.
- Y. Zeng, Y. Zhuge, H. Lu, L. Zhang, M. Qian, and Y. Yu, “Multi-source weak supervision for saliency detection,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6074–6083.
- M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International Journal of Computer Vision (IJCV), vol. 88, no. 2, pp. 303–338, 2010.
- J. Liu, C. Ye, S. Wang, R. Cui, J. Zhang, K. Zhang, and N. Barnes, “Model calibration in dense classification with adaptive label perturbation,” arXiv preprint arXiv:2307.13539, 2023.