Universal Organizer of SAM for Unsupervised Semantic Segmentation (2405.11742v1)
Abstract: Unsupervised semantic segmentation (USS) aims to achieve high-quality segmentation without manual pixel-level annotations. Existing USS models provide coarse category classification for regions, but the results often have blurry and imprecise edges. Recently, a robust framework called the segment anything model (SAM) has been proven to deliver precise boundary object masks. Therefore, this paper proposes a universal organizer based on SAM, termed as UO-SAM, to enhance the mask quality of USS models. Specifically, using only the original image and the masks generated by the USS model, we extract visual features to obtain positional prompts for target objects. Then, we activate a local region optimizer that performs segmentation using SAM on a per-object basis. Finally, we employ a global region optimizer to incorporate global image information and refine the masks to obtain the final fine-grained masks. Compared to existing methods, our UO-SAM achieves state-of-the-art performance.
- G. Pei, F. Shen, Y. Yao, G.-S. Xie, Z. Tang, and J. Tang, “Hierarchical feature alignment network for unsupervised video object segmentation,” in ECCV, 2022, pp. 596–613.
- G. Pei, F. Shen, Y. Yao, T. Chen, X.-S. Hua, and H.-T. Shen, “Hierarchical graph pattern understanding for zero-shot video object segmentation,” IEEE Transactions on Image Processing, vol. 32, pp. 5909–5920, 2023.
- G. Pei, Y. Yao, F. Shen, D. Huang, X. Huang, and H.-T. Shen, “Hierarchical co-attention propagation network for zero-shot video object segmentation,” IEEE Transactions on Image Processing, vol. 32, pp. 2348–2359, 2023.
- H. Liu, P. Peng, T. Chen, Q. Wang, Y. Yao, and X.-S. Hua, “Fecanet: Boosting few-shot semantic segmentation with feature-enhanced context-aware network,” IEEE Transactions on Multimedia, vol. 25, pp. 8580–8592, 2023.
- J. Tang, X. Shu, G.-J. Qi, Z. Li, M. Wang, S. Yan, and R. Jain, “Tri-clustered tensor completion for social-aware image tag refinement,” TPAMI, vol. 39, no. 8, pp. 1662–1674, 2016.
- G. Pei, T. Chen, X. Jiang, H. Liu, Z. Sun, and Y. Yao, “Videomac: Video masked autoencoders meet convnets,” CVPR, 2024.
- Q. Hu, Y. Chen, J. Xiao, S. Sun, J. Chen, A. L. Yuille, and Z. Zhou, “Label-free liver tumor segmentation,” in CVPR, 2023, pp. 7422–7432.
- X. Li, H. He, X. Li, D. Li, G. Cheng, J. Shi, L. Weng, Y. Tong, and Z. Lin, “Pointflow: Flowing semantics through points for aerial image segmentation,” in CVPR, 2021, pp. 4217–4226.
- Y. Liu, J. Zhang, L. Fang, Q. Jiang, and B. Zhou, “Multimodal motion prediction with stacked transformers,” in CVPR, 2021, pp. 7577–7586.
- Y. Yao, T. Chen, G.-S. Xie, C. Zhang, F. Shen, Q. Wu, Z. Tang, and J. Zhang, “Non-salient region object mining for weakly supervised semantic segmentation,” in CVPR, 2021, pp. 2623–2632.
- T. Chen, Y. Yao, and J. Tang, “Multi-granularity denoising and bidirectional alignment for weakly supervised semantic segmentation,” IEEE Transactions on Image Processing, vol. 32, pp. 2960–2971, 2023.
- D. Zhang, H. Zhang, J. Tang, X.-S. Hua, and Q. Sun, “Causal intervention for weakly-supervised semantic segmentation,” NIPS, vol. 33, pp. 655–666, 2020.
- T. Chen, Y. Yao, X. Huang, Z. Li, L. Nie, and J. Tang, “Spatial structure constraints for weakly supervised semantic segmentation,” TIP, vol. 33, pp. 1136–1148, 2024.
- T. Chen, Y. Yao, L. Zhang, Q. Wang, G. Xie, and F. Shen, “Saliency guided inter- and intra-class relation constraints for weakly supervised semantic segmentation,” IEEE Transactions on Multimedia, vol. 25, pp. 1727–1737, 2023.
- T. Chen, G. Xie, Y. Yao, Q. Wang, F. Shen, Z. Tang, and J. Zhang, “Semantically meaningful class prototype learning for one-shot image segmentation,” IEEE Transactions on Multimedia, vol. 24, pp. 968–980, 2022.
- Y. Tang, T. Chen, X. Jiang, Y. Yao, G.-S. Xie, and H.-T. Shen, “Holistic prototype attention network for few-shot video object segmentation,” 2023.
- T. Chen, S.-H. Wang, Q. Wang, Z. Zhang, G.-S. Xie, and Z. Tang, “Enhanced feature alignment for unsupervised domain adaptation of semantic segmentation,” IEEE Transactions on Multimedia, vol. 24, pp. 1042–1054, 2022.
- Y. Yao, J. Zhang, F. Shen, L. Liu, F. Zhu, D. Zhang, and H. T. Shen, “Towards automatic construction of diverse, high-quality image datasets,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 6, pp. 1199–1211, 2020.
- Z. Sun, F. Shen, D. Huang, Q. Wang, X. Shu, Y. Yao, and J. Tang, “Pnp: Robust learning from noisy labels by probabilistic noise prediction,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 5311–5320.
- S. Gao, Z.-Y. Li, M.-H. Yang, M.-M. Cheng, J. Han, and P. Torr, “Large-scale unsupervised semantic segmentation,” TPAMI, vol. 45, no. 6, pp. 7457–7476, 2023.
- M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” IJCV, vol. 88, pp. 303–338, 2010.
- H. Caesar, J. Uijlings, and V. Ferrari, “Coco-stuff: Thing and stuff classes in context,” in CVPR, 2018, pp. 1209–1218.
- L. Zhou, Y. Liu, X. Bai, N. Li, X. Yu, J. Zhou, and E. R. Hancock, “Attribute subspaces for zero-shot learning,” Pattern Recognition, vol. 144, p. 109869, 2023.
- T. Li, G. Pang, X. Bai, J. Zheng, L. Zhou, and X. Ning, “Learning adversarial semantic embeddings for zero-shot recognition in open worlds,” Pattern Recognition, vol. 149, p. 110258, 2024.
- Y. Yao, Z. Sun, C. Zhang, F. Shen, Q. Wu, J. Zhang, and Z. Tang, “Jo-src: A contrastive approach for combating noisy labels,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 5192–5201.
- Y. Yao, J. Zhang, F. Shen, X. Hua, J. Xu, and Z. Tang, “Exploiting web images for dataset construction: A domain robust approach,” IEEE Transactions on Multimedia, vol. 19, no. 8, pp. 1771–1784, 2017.
- M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised learning of visual features,” in ECCV, 2018, pp. 132–149.
- X. Ji, J. F. Henriques, and A. Vedaldi, “Invariant information clustering for unsupervised image classification and segmentation,” in ICCV, 2019, pp. 9865–9874.
- J. H. Cho, U. Mall, K. Bala, and B. Hariharan, “Picie: Unsupervised semantic segmentation using invariance and equivariance in clustering,” in CVPR, 2021, pp. 16 794–16 804.
- M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in ICCV, 2021, pp. 9650–9660.
- Z. Yin, P. Wang, F. Wang, X. Xu, H. Zhang, H. Li, and R. Jin, “Transfgu: a top-down approach to fine-grained unsupervised semantic segmentation,” in ECCV, 2022, pp. 73–89.
- B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in CVPR, 2016, pp. 2921–2929.
- M. Hamilton, Z. Zhang, B. Hariharan, N. Snavely, and W. T. Freeman, “Unsupervised semantic segmentation by distilling feature correspondences,” in International Conference on Learning Representations, 2022.
- H. S. Seong, W. Moon, S. Lee, and J.-P. Heo, “Leveraging hidden positives for unsupervised semantic segmentation,” in CVPR, 2023, pp. 19 540–19 549.
- M. Lan, X. Wang, Y. Ke, J. Xu, L. Feng, and W. Zhang, “Smooseg: Smoothness prior for unsupervised semantic segmentation,” vol. 36, 2024.
- J. Kim, B.-K. Lee, and Y. M. Ro, “Causal unsupervised semantic segmentation,” arXiv preprint arXiv:2310.07379, 2023.
- X. Jiang, Y. Yao, S. Liu, F. Shen, L. Nie, and X.-S. Hua, “Dual dynamic threshold adjustment strategy,” vol. 20, no. 7, 2024.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” in ICCV, 2023, pp. 4015–4026.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009, pp. 248–255.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in ECCV, 2014, pp. 740–755.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” NIPS, vol. 32, 2019.
- W. Van Gansbeke, S. Vandenhende, S. Georgoulis, and L. Van Gool, “Unsupervised semantic segmentation by contrasting object mask proposals,” in ICCV, 2021, pp. 10 032–10 042.
- K. Li, Z. Wang, Z. Cheng, R. Yu, Y. Zhao, G. Song, C. Liu, L. Yuan, and J. Chen, “Acseg: Adaptive conceptualization for unsupervised semantic segmentation,” in CVPR, 2023, pp. 7162–7172.