Discriminative Consensus Mining with A Thousand Groups for More Accurate Co-Salient Object Detection (2403.12057v1)
Abstract: Co-Salient Object Detection (CoSOD) is a rapidly growing task, extended from Salient Object Detection (SOD) and Common Object Segmentation (Co-Segmentation). It is aimed at detecting the co-occurring salient object in the given image group. Many effective approaches have been proposed on the basis of existing datasets. However, there is still no standard and efficient training set in CoSOD, which makes it chaotic to choose training sets in the recently proposed CoSOD methods. First, the drawbacks of existing training sets in CoSOD are analyzed in a comprehensive way, and potential improvements are provided to solve existing problems to some extent. In particular, in this thesis, a new CoSOD training set is introduced, named Co-Saliency of ImageNet (CoSINe) dataset. The proposed CoSINe is the largest number of groups among all existing CoSOD datasets. The images obtained here span a wide variety in terms of categories, object sizes, etc. In experiments, models trained on CoSINe can achieve significantly better performance with fewer images compared to all existing datasets. Second, to make the most of the proposed CoSINe, a novel CoSOD approach named Hierarchical Instance-aware COnsensus MinEr (HICOME) is proposed, which efficiently mines the consensus feature from different feature levels and discriminates objects of different classes in an object-aware contrastive way. As extensive experiments show, the proposed HICOME achieves SoTA performance on all the existing CoSOD test sets. Several useful training tricks suitable for training CoSOD models are also provided. Third, practical applications are given using the CoSOD technique to show the effectiveness. Finally, the remaining challenges and potential improvements of CoSOD are discussed to inspire related work in the future. The source code, the dataset, and the online demo will be publicly available at github.com/ZhengPeng7/CoSINe.
- Q. Fan, D.-P. Fan, H. Fu, C.-K. Tang, L. Shao, and Y.-W. Tai, “Group collaborative learning for co-salient object detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2021, pp. 12 283–12 293.
- P. Zheng, H. Fu, D.-P. Fan, Q. Fan, J. Qin, Y.-W. Tai, C.-K. Tang, and L. Van Gool, “GCoNet+: A stronger group collaborative co-salient object detector,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–18, 2023.
- D.-P. Fan, T. Li, Z. Lin, G.-P. Ji, D. Zhang, M.-M. Cheng, H. Fu, and J. Shen, “Re-thinking co-salient object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021.
- P. Zheng, J. Qin, S. Wang, T.-Z. Xiang, and H. Xiong, “Memory-aided contrastive consensus learning for co-salient object detection,” in AAAI Conference on Artificial Intelligence, 2023.
- H. Fu, X. Cao, and Z. Tu, “Cluster-based co-saliency detection,” IEEE Transactions on Image Process., vol. 22, no. 10, pp. 3766–3778, 2013.
- D. E. Jacobs, D. B. Goldman, and E. Shechtman, “Cosaliency: Where people look when comparing images,” in ACM symposium on User interface software and technology, 2010, pp. 219–228.
- D. Zhang, J. Han, C. Li, J. Wang, and X. Li, “Detection of co-salient objects by looking deep and wide,” International Journal of Computer Vision, vol. 120, no. 2, pp. 215–232, 2016.
- Z. Zhang, W. Jin, J. Xu, and M.-M. Cheng, “Gradient-induced co-saliency detection,” in European Conference on Computer Vision, 2020, pp. 455–472.
- K.-J. Hsu, Y.-Y. Lin, and Y.-Y. Chuang, “DeepCO3: Deep instance co-segmentation by co-peak search and co-saliency detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2019, pp. 8846–8855.
- X. Wang, X. Liang, B. Yang, and F. W. Li, “No-reference synthetic image quality assessment with convolutional neural network and local image saliency,” Computational Visual Media, vol. 5, no. 2, pp. 193–208, 2019.
- L. Wang, H. Lu, Y. Wang, M. Feng, D. Wang, B. Yin, and X. Ruan, “Learning to detect salient objects with image-level supervision,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2017, pp. 3796–3805.
- C. Wang, Z. Zha, D. Liu, and H. Xie, “Robust deep co-saliency detection with group semantic,” in AAAI Conference on Artificial Intelligence, 2019, pp. 8917–8924.
- L. Wei, S. Zhao, O. E. F. Bourahla, X. Li, and F. Wu, “Group-wise deep co-saliency detection,” in International Joint Conference on Artificial Intelligence, 2017, pp. 3041–3047.
- T.-Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in European Conference on Computer Vision, 2014, pp. 740–755.
- Q. Zhang, R. Cong, J. Hou, C. Li, and Y. Zhao, “CoADNet: Collaborative aggregation-and-distribution networks for co-salient object detection,” Advances in Neural Information Processing Systems, pp. 6959–6970, 2020.
- N. Zhang, J. Han, N. Liu, and L. Shao, “Summarize and search: Learning consensus-aware dynamic convolution for co-saliency detection,” in IEEE / CVF International Conference on Computer Vision, 2021, pp. 4167–4176.
- M.-M. Cheng, G.-X. Zhang, N. J. Mitra, X. Huang, and S. Hu, “Global contrast based salient region detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2011, pp. 409–416.
- X. Zhang, T. Wang, J. Qi, H. Lu, and G. Wang, “Progressive attention guided recurrent network for salient object detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2018, pp. 714–722.
- H. Jiang, Z. Yuan, M.-M. Cheng, Y. Gong, N. Zheng, and J. Wang, “Salient object detection: A discriminative regional feature integration approach,” International Journal of Computer Vision, vol. 123, pp. 251–268, 2013.
- L. Wang, H. Lu, X. Ruan, and M.-H. Yang, “Deep networks for saliency detection via local estimation and global search,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2015, pp. 3183–3192.
- N. Liu, N. Zhang, K. Wan, J. Han, and L. Shao, “Visual saliency transformer,” in IEEE / CVF International Conference on Computer Vision, 2021, pp. 4702–4712.
- D. P. Kingma and J. Ba, “Promoting saliency from depth: Deep unsupervised rgb-d saliency detection,” in International Conference on Learning Representations, 2022.
- G. Li and Y. Yu, “Visual saliency based on multiscale deep features,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2015, pp. 5455–5463.
- R. Zhao, W. Ouyang, H. Li, and X. Wang, “Saliency detection by multi-context deep learning,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2015, pp. 1265–1274.
- G. Lee, Y.-W. Tai, and J. Kim, “Deep saliency with encoded low level distance map and high level features,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2016, pp. 660–668.
- S. He, R. W. H. Lau, W. Liu, Z. Huang, and Q. Yang, “SuperCNN: A superpixelwise convolutional neural network for salient object detection,” International Journal of Computer Vision, vol. 115, pp. 330–344, 2015.
- J. Zhang, S. Sclaroff, Z. L. Lin, X. Shen, B. L. Price, and R. Mech, “Unconstrained salient object detection via proposal subset optimization,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2016, pp. 5733–5742.
- J. Kim and V. Pavlovic, “A shape-based approach for salient object detection using deep learning,” in European Conference on Computer Vision, 2016, pp. 455–470.
- G. Shin, S. Albanie, and W. Xie, “Unsupervised salient object detection with spectral cluster voting,” in IEEE / CVF Computer Vision and Pattern Recognition Conference Workshop, 2022, pp. 3971–3980.
- F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Sorkine-Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2012, pp. 733–740.
- C. Rother, V. Kolmogorov, and A. Blake, “”GrabCut”: interactive foreground extraction using iterated graph cuts,” ACM Transactions on Graphics, vol. 23, no. 3, pp. 309–314, 2004.
- X. Li, H. Lu, L. Zhang, X. Ruan, and M.-H. Yang, “Saliency detection via dense and sparse reconstruction,” in IEEE / CVF International Conference on Computer Vision, 2013, pp. 2976–2983.
- J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2015, pp. 3431–3440.
- J.-X. Zhao, J. Liu, D.-P. Fan, Y. Cao, J. Yang, and M.-M. Cheng, “EGNet: Edge guidance network for salient object detection,” in IEEE / CVF International Conference on Computer Vision, 2019, pp. 8778–8787.
- X. Qin, Z. Zhang, C. Huang, C. Gao, M. Dehghan, and M. Jagersand, “BASNet: Boundary-aware salient object detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2019, pp. 7479–7489.
- J. Liu, Q. Hou, M.-M. Cheng, J. Feng, and J. Jiang, “A simple pooling-based design for real-time salient object detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2019, pp. 3912–3921.
- O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer Assisted Interventions, 2015, pp. 234–241.
- T.-Y. Lin, P. Dollár, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2017, pp. 936–944.
- N. Liu and J. Han, “DHSNet: Deep hierarchical saliency network for salient object detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2016, pp. 678–686.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in IEEE / CVF International Conference on Computer Vision, 2021, pp. 9992–10 002.
- W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in IEEE / CVF International Conference on Computer Vision, 2021, pp. 548–558.
- M. Zhuge, D.-P. Fan, N. Liu, D. Zhang, D. Xu, and L. Shao, “Salient object detection via integrity learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- L. Zhang, Y. Zhang, H. Yan, Y. Gao, and W. Wei, “Salient object detection in hyperspectral imagery using multi-scale spectral-spatial gradient,” Neurocomputing, vol. 291, pp. 215–225, 2018.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2018, pp. 7132–7141.
- S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.
- J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2019, pp. 3146–3154.
- N. Liu, J. Han, and M.-H. Yang, “PiCANet: Learning pixel-wise contextual attention for saliency detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2018, pp. 3089–3098.
- T. Zhao and X. Wu, “Pyramid feature attention network for saliency detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2019, pp. 3080–3089.
- S. Chen, X. Tan, B. Wang, and X. Hu, “Reverse attention for salient object detection,” in European Conference on Computer Vision, 2018, pp. 236–252.
- L. Qu, S. He, J. Zhang, J. Tian, Y. Tang, and Q. Yang, “Rgbd salient object detection via deep fusion,” IEEE Transactions on Image Process., vol. 26, no. 5, pp. 2274–2285, 2017.
- K. Fu, D.-P. Fan, G.-P. Ji, Q. Zhao, J. Shen, and C. Zhu, “Siamese network for rgb-d salient object detection and beyond,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5541–5559, 2021.
- W. Zhou, Q. Guo, J. Lei, L. Yu, and J.-N. Hwang, “Ecffnet: Effective and consistent feature fusion network for rgb-t salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1224–1235, 2021.
- H. Bi, R. Wu, Z. Liu, J. Zhang, C. Zhang, T.-Z. Xiang, and X. Wang, “Psnet: Parallel symmetric network for rgb-t salient object detection,” Neurocomputing, vol. 511, pp. 410–425, 2022.
- M. Zhang, J. Li, J. Wei, Y. Piao, and H. Lu, “Memory-oriented decoder for light field salient object detection,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- Y. Piao, Y. Jiang, M. Zhang, J. Wang, and H. Lu, “Panet: Patch-aware network for light field salient object detection,” IEEE Transactions on Cybernetics, vol. 53, no. 1, pp. 379–391, 2023.
- Y. Zhou, S. Huo, W. Xiang, C. Hou, and S.-Y. Kung, “Semi-supervised salient object detection using a linear feedback control system model,” IEEE Transactions on Cybernetics, vol. 49, no. 4, pp. 1173–1185, 2018.
- Y. Lv, B. Liu, J. Zhang, Y. Dai, A. Li, and T. Zhang, “Semi-supervised active salient object detection,” Pattern Recognition, vol. 123, p. 108364, 2022.
- Y. Wang, X. Shen, S. X. Hu, Y. Yuan, J. L. Crowley, and D. Vaufreydaz, “Self-supervised transformers for unsupervised object discovery using normalized cut,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2022, pp. 14 543–14 553.
- X. Zhao, Y. Pang, L. Zhang, H. Lu, and X. Ruan, “Self-supervised pretraining for rgb-d salient object detection,” in AAAI Conference on Artificial Intelligence, 2022, pp. 3463–3471.
- D. Zhang, J. Han, and Y. Zhang, “Supervision by fusion: Towards unsupervised learning of deep salient object detector,” in IEEE / CVF International Conference on Computer Vision, 2017, pp. 4048–4056.
- W. Liu, C. Zhang, G. Lin, and F. Liu, “CRNet: Cross-reference networks for few-shot segmentation,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2020, pp. 4164–4172.
- M. Siam, N. Doraiswamy, B. N. Oreshkin, H. Yao, and M. Jägersand, “Weakly supervised few-shot object segmentation using co-attention with visual and semantic embeddings,” in International Joint Conference on Artificial Intelligence, 2020.
- T.-W. Ke, J.-J. Hwang, Y. Guo, X. Wang, and S. X. Yu, “Unsupervised hierarchical semantic segmentation with multiview cosegmentation and clustering transformers,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2022, pp. 2571–2581.
- H. Zhang, H. Zhang, C. Wang, and J. Xie, “Co-occurrent features in semantic segmentation,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2019, pp. 548–557.
- D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen, “iCoseg: Interactive co-segmentation with intelligent scribble guidance,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2010, pp. 3169–3176.
- K.-Y. Chang, T.-L. Liu, and S.-H. Lai, “From co-saliency to co-segmentation: An efficient and fully unsupervised energy minimization model,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2011, pp. 2129–2136.
- C. Rother, T. Minka, A. Blake, and V. Kolmogorov, “Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2006, pp. 993–1000.
- W. Li, O. Hosseini Jafari, and C. Rother, “Deep object co-segmentation,” in Asian Conference on Computer Vision, 2018, pp. 638–653.
- C. Zhang, G. Li, G. Lin, Q. Wu, and R. Yao, “Cyclesegnet: Object co-segmentation with cycle refinement and region correspondence,” IEEE Transactions on Image Process., vol. 30, pp. 5652–5664, 2021.
- B. Li, Z. Sun, Q. Li, Y. Wu, and A. Hu, “Group-wise deep object co-segmentation with co-attention recurrent neural network,” in IEEE / CVF International Conference on Computer Vision, 2019, pp. 8519–8528.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
- X. Wang, S. You, X. Li, and H. Ma, “Weakly-supervised semantic segmentation by iteratively mining common object features,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2018, pp. 1354–1362.
- R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, “Slic superpixels compared to state-of-the-art superpixel methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2274–2282, 2012.
- Z. Liu, W. Zou, L. Li, L. Shen, and O. Le Meur, “Co-saliency detection based on hierarchical segmentation,” IEEE Signal Processing Letters, vol. 21, no. 1, pp. 88–92, 2014.
- J. Han, G. Cheng, Z. Li, and D. Zhang, “A unified metric learning-based framework for co-saliency detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2473–2483, 2018.
- D. Zhang, D. Meng, and J. Han, “Co-saliency detection via a self-paced multiple-instance learning framework,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 5, pp. 865–878, 2017.
- G. Ren, T. Dai, and T. Stathaki, “Adaptive intra-group aggregation for co-saliency detection,” in International Conference on Acoustics, Speech, and Signal Processing, 2022, pp. 2520–2524.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2016, pp. 770–778.
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2016, pp. 2818–2826.
- Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, F. Wei, and B. Guo, “Swin transformer v2: Scaling up capacity and resolution,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2021, pp. 11 999–12 009.
- W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and L. Shao, “Pvtv2: Improved baselines with pyramid vision transformer,” Computational Visual Media, vol. 8, no. 3, pp. 1–10, 2022.
- L. Tang, “CoSformer: Detecting co-salient object with transformers,” arXiv preprint arXiv:2104.14729, 2021.
- X. Zheng, Z. Zha, and L. Zhuang, “A feature-adaptive semi-supervised framework for co-saliency detection,” in ACM International Conference on Multimedia, 2018, pp. 959–966.
- X. Qian, Y. Zeng, W. Wang, and Q. Zhang, “Co-saliency detection guided by group weakly supervised learning,” IEEE Transactions on Multimedia, pp. 1–1, 2022.
- K.-J. Hsu, C.-C. Tsai, Y.-Y. Lin, X. Qian, and Y.-Y. Chuang, “Unsupervised cnn-based co-saliency detection with graphical optimization,” in European Conference on Computer Vision, 2018, pp. 502–518.
- K.-J. Hsu, Y.-Y. Lin, and Y.-Y. Chuang, “Co-attention cnns for unsupervised object co-segmentation,” in International Joint Conference on Artificial Intelligence, 2018, pp. 748–756.
- Y. Zeng, Y. Zhuge, H. Lu, and L. Zhang, “Joint learning of saliency detection and weakly supervised semantic segmentation,” in IEEE / CVF International Conference on Computer Vision, 2019, pp. 7223–7233.
- J. Winn, A. Criminisi, and T. Minka, “Object categorization by learned universal visual dictionary,” in IEEE / CVF International Conference on Computer Vision, vol. 2, 2005, pp. 1800–1807.
- H. Li and K. N. Ngan, “A co-saliency model of image pairs,” IEEE Transactions on Image Process., vol. 20, no. 12, pp. 3365–3375, 2011.
- F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet: A unified embedding for face recognition and clustering,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2015, pp. 815–823.
- S. Yu, J. Xiao, B. Zhang, and E. G. Lim, “Democracy does matter: Comprehensive feature mining for co-salient object detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2022.
- Y. Wu, H. Song, B. Liu, K. Zhang, and D. Liu, “Co-salient object detection with uncertainty-aware group exchange-masking,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2023, pp. 19 639–19 648.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2009, pp. 248–255.
- S. Gao, Z.-Y. Li, M.-H. Yang, M.-M. Cheng, J. Han, and P. Torr, “Large-scale unsupervised semantic segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, pp. 7457–7476, 2023.
- W.-D. Jin, J. Xu, M.-M. Cheng, Y. Zhang, and W. Guo, “ICNet: Intra-saliency correlation network for co-saliency detection,” Advances in Neural Information Processing Systems, pp. 18 749–18 759, 2020.
- D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji, “Structure-Measure: A new way to evaluate foreground maps,” in IEEE / CVF International Conference on Computer Vision, 2017, pp. 4558–4567.
- R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2009, pp. 1597–1604.
- D.-P. Fan, C. Gong, Y. Cao, B. Ren, M.-M. Cheng, and A. Borji, “Enhanced-alignment measure for binary foreground map evaluation,” in International Joint Conference on Artificial Intelligence, 2018, pp. 698–704.
- A. Borji, M.-M. Cheng, H. Jiang, and J. Li, “Salient object detection: A benchmark,” IEEE Transactions on Image Process., vol. 24, pp. 5706–5722, 2015.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, 2015.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “PyTorch: An imperative style, high-performance deep learning library,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- Y. Su, J. Deng, R. Sun, G. Lin, H. Su, and Q. Wu, “A unified transformer framework for group-based segmentation: Co-segmentation, co-saliency detection and video salient object detection,” IEEE Transactions on Multimedia, pp. 1–13, 2023.
- B. Li, Z. Sun, L. Tang, Y. Sun, and J. Shi, “Detecting robust co-saliency with recurrent co-attention neural network,” in International Joint Conference on Artificial Intelligence, 2019, pp. 818–825.
- K. Zhang, T. Li, S. Shen, B. Liu, J. Chen, and Q. Liu, “Adaptive graph convolutional network with attention graph clustering for co-saliency detection,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2020, pp. 9047–9056.
- K. Zhang, M. Dong, B. Liu, X.-T. Yuan, and Q. Liu, “DeepACG: Co-saliency detection via semantic-aware contrast gromov-wasserstein distance,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2021, pp. 13 698–13 707.
- K. Zhang, T. Li, B. Liu, and Q. Liu, “Co-saliency detection via mask-guided fully convolutional networks with multi-scale label smoothing,” in IEEE / CVF Computer Vision and Pattern Recognition Conference, 2019, pp. 3090–3099.
- H. Peng, B. Li, W. Xiong, W. Hu, and R. Ji, “Rgbd salient object detection: A benchmark and algorithms,” in European Conference on Computer Vision, 2014, pp. 92–109.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment anything,” in IEEE / CVF International Conference on Computer Vision, 2023.
- S. He, R. Bao, J. Li, P. E. Grant, and Y. Ou, “Accuracy of segment-anything model (sam) in medical image segmentation tasks,” arXiv preprint arXiv:2304.09324, 2023.
- T. Chen, L. Zhu, C. Ding, R. Cao, S. Zhang, Y. Wang, Z. Li, L. Sun, P. Mao, and Y. Zang, “Sam fails to segment anything?–sam-adapter: Adapting sam in underperformed scenes: Camouflage, shadow, and more,” arXiv preprint arXiv:2304.09148, 2023.
- Q. Shen, X. Yang, and X. Wang, “Anything-3d: Towards single-view anything reconstruction in the wild,” arXiv preprint arXiv:2304.10261, 2023.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning, 2021, pp. 8748–8763.
- J. Li, D. Li, C. Xiong, and S. Hoi, “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in International Conference on Machine Learning. PMLR, 2022, pp. 12 888–12 900.
- J. Li, D. Li, S. Savarese, and S. Hoi, “Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models,” International Conference on Machine Learning, 2023.
- X. Zhang, B. Yin, Z. Lin, Q. Hou, D.-P. Fan, and M.-M. Cheng, “Referring camouflaged object detection,” arXiv preprint arXiv:2306.07532, 2023.