Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Lightweight Clustering Framework for Unsupervised Semantic Segmentation (2311.18628v2)

Published 30 Nov 2023 in cs.CV

Abstract: Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data. It is a widely researched area as obtaining labeled datasets is expensive. While previous works in the field have demonstrated a gradual improvement in model accuracy, most required neural network training. This made segmentation equally expensive, especially when dealing with large-scale datasets. We thus propose a lightweight clustering framework for unsupervised semantic segmentation. We discovered that attention features of the self-supervised Vision Transformer exhibit strong foreground-background differentiability. Therefore, clustering can be employed to effectively separate foreground and background image patches. In our framework, we first perform multilevel clustering across the Dataset-level, Category-level, and Image-level, and maintain consistency throughout. Then, the binary patch-level pseudo-masks extracted are upsampled, refined and finally labeled. Furthermore, we provide a comprehensive analysis of the self-supervised Vision Transformer features and a detailed comparison between DINO and DINOv2 to justify our claims. Our framework demonstrates great promise in unsupervised semantic segmentation and achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Move: Unsupervised movable object segmentation and detection. Advances in Neural Information Processing Systems, 35:33371–33386, 2022.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9650–9660, 2021.
  4. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022.
  5. Picie: Unsupervised semantic segmentation using invariance and equivariance in clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16794–16804, 2021.
  6. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3213–3223, 2016.
  7. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
  9. The pascal visual object classes challenge 2012 (voc2012) development kit. Pattern Anal. Stat. Model. Comput. Learn., Tech. Rep, 2007:1–45, 2012.
  10. Pranet: Parallel reverse attention network for polyp segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention, pages 263–273. Springer, 2020.
  11. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3146–3154, 2019.
  12. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
  13. Unsupervised semantic segmentation by distilling feature correspondences. In International Conference on Learning Representations, 2022.
  14. Infoseg: Unsupervised semantic image segmentation with mutual information maximization. In DAGM German Conference on Pattern Recognition, pages 18–32. Springer, 2021.
  15. Segsort: Segmentation by discriminative sorting of segments. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7334–7344, 2019.
  16. Invariant information clustering for unsupervised image classification and segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9865–9874, 2019.
  17. Unsupervised hierarchical semantic segmentation with multiview cosegmentation and clustering transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2571–2581, 2022.
  18. Efficient inference in fully connected crfs with gaussian edge potentials. Advances in neural information processing systems, 24, 2011.
  19. A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Transactions on Medical Imaging, 36(7):1550–1560, 2017.
  20. Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5267–5276, 2019.
  21. xformers: A modular and hackable transformer modelling library, 2021.
  22. Acseg: Adaptive conceptualization for unsupervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7162–7172, 2023.
  23. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3159–3167, 2016.
  24. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  25. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  26. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems, 32, 2019.
  27. Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8364–8375, 2022.
  28. Unsupervised image segmentation by mutual information maximization and adversarial regularization. IEEE Robotics and Automation Letters, 6(4):6931–6938, 2021.
  29. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
  30. Autoregressive unsupervised image segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pages 142–158. Springer, 2020.
  31. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  32. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  33. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
  34. Spreading vectors for similarity search. arXiv preprint arXiv:1806.03198, 2018.
  35. Leveraging hidden positives for unsupervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19540–19549, 2023.
  36. Gland segmentation in colon histology images: The glas challenge contest. Medical Image Analysis, 35:489–502, 2017.
  37. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7262–7272, 2021.
  38. Deit iii: Revenge of the vit. In European Conference on Computer Vision, pages 516–533. Springer, 2022.
  39. Unsupervised semantic segmentation by contrasting object mask proposals. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10052–10062, 2021.
  40. Discovering object masks with transformers for unsupervised semantic segmentation. arXiv preprint arXiv:2206.06363, 2022.
  41. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  42. Cut and learn for unsupervised object detection and instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3124–3134, 2023.
  43. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12275–12284, 2020.
  44. Fully self-supervised learning for semantic segmentation. arXiv preprint arXiv:2202.11981, 2022.
  45. Self-supervised visual representation learning with semantic grouping. Advances in Neural Information Processing Systems, 35:16423–16438, 2022.
  46. Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077–12090, 2021.
  47. Multi-class token transformer for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4310–4319, 2022.
  48. Non-salient region object mining for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2623–2632, 2021.
  49. Transfgu: a top-down approach to fine-grained unsupervised semantic segmentation. In European Conference on Computer Vision, pages 73–89. Springer, 2022.
  50. Unsupervised semantic segmentation with self-supervised object-centric representations. arXiv preprint arXiv:2207.05027, 2022.
  51. Looking beyond single images for contrastive semantic segmentation learning. Advances in neural information processing systems, 34:3285–3297, 2021.
  52. Self-supervised visual representation learning from hierarchical grouping. Advances in Neural Information Processing Systems, 33:16579–16590, 2020.
  53. Scene parsing through ade20k dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 633–641, 2017.
  54. Self-supervised learning of object parts for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14502–14511, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.