Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation (2403.01482v4)

Published 3 Mar 2024 in cs.CV

Abstract: Semantic segmentation has innately relied on extensive pixel-level annotated data, leading to the emergence of unsupervised methodologies. Among them, leveraging self-supervised Vision Transformers for unsupervised semantic segmentation (USS) has been making steady progress with expressive deep features. Yet, for semantically segmenting images with complex objects, a predominant challenge remains: the lack of explicit object-level semantic encoding in patch-level features. This technical limitation often leads to inadequate segmentation of complex objects with diverse structures. To address this gap, we present a novel approach, EAGLE, which emphasizes object-centric representation learning for unsupervised semantic segmentation. Specifically, we introduce EiCue, a spectral technique providing semantic and structural cues through an eigenbasis derived from the semantic similarity matrix of deep image features and color affinity from an image. Further, by incorporating our object-centric contrastive loss with EiCue, we guide our model to learn object-level representations with intra- and inter-image object-feature consistency, thereby enhancing semantic accuracy. Extensive experiments on COCO-Stuff, Cityscapes, and Potsdam-3 datasets demonstrate the state-of-the-art USS results of EAGLE with accurate and consistent semantic segmentation across complex scenes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4981–4990, 2018.
  2. Semi-supervised semantic segmentation with pixel-level contrastive learning from a class-wise memory bank. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8219–8228, 2021.
  3. Unsupervised segmentation of hyperspectral remote sensing images with superpixels. Remote Sensing Applications: Society and Environment, 28:100823, 2022.
  4. Coco-stuff: Thing and stuff classes in context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1209–1218, 2018.
  5. Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV), pages 132–149, 2018.
  6. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9650–9660, 2021.
  7. Jeff Cheeger. A lower bound for the smallest eigenvalue of the laplacian. In Problems in Analysis: A Symposium in Honor of Salomon Bochner (PMS-31), pages 195–200. Princeton University Press, 2015.
  8. Picie: Unsupervised semantic segmentation using invariance and equivariance in clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16794–16804, 2021.
  9. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3213–3223, 2016.
  10. Learning neural eigenfunctions for unsupervised semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 551–561, 2023.
  11. Image segmentation using k-means clustering algorithm and subtractive clustering algorithm. Procedia Computer Science, 54:764–771, 2015.
  12. Lanet: Local attention embedding to improve the semantic segmentation of remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 59(1):426–435, 2020.
  13. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision, pages 1422–1430, 2015.
  14. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
  15. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems, 22(3):1341–1360, 2020.
  16. Unsupervised semantic segmentation by distilling feature correspondences. In International Conference on Learning Representations, 2022.
  17. Infoseg: Unsupervised semantic image segmentation with mutual information maximization. In DAGM German Conference on Pattern Recognition, pages 18–32. Springer, 2021.
  18. Efficient visual pretraining with contrastive detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10086–10096, 2021.
  19. Object discovery and representation networks. In European Conference on Computer Vision, pages 123–143. Springer, 2022.
  20. Segsort: Segmentation by discriminative sorting of segments. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7334–7344, 2019.
  21. Learning visual groups from co-occurrences in space and time. arXiv preprint arXiv:1511.06811, 2015.
  22. Invariant information clustering for unsupervised image classification and segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9865–9874, 2019.
  23. Medical image semantic segmentation based on deep learning. Neural Computing and Applications, 29:1257–1265, 2018.
  24. Improving dense representation learning by superpixelization and contrasting cluster assignment. In British Machine Vision Conference, 2021.
  25. Constrained-cnn losses for weakly supervised segmentation. Medical Image Analysis, 54:88–99, 2019.
  26. Mean shift for self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10326–10335, 2021.
  27. Efficient inference in fully connected crfs with gaussian edge potentials. Advances in Neural Information Processing Systems, 24, 2011.
  28. Weakly supervised semantic segmentation via adversarial learning of classifier and reconstructor. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11329–11339, 2023.
  29. Semi-supervised semantic segmentation with directional context-aware consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1205–1214, 2021.
  30. Lisa: Localized image stylization with audio via implicit neural representation. arXiv preprint arXiv:2211.11381, 2022.
  31. Multiattention network for semantic segmentation of fine-resolution remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 60:1–13, 2021.
  32. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10012–10022, 2021.
  33. David G Lowe. Object recognition from local scale-invariant features. In Proceedings of the seventh IEEE International Conference on Computer Vision, pages 1150–1157. Ieee, 1999.
  34. Image segmentation using text and image prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7086–7096, 2022.
  35. A spatial constrained k-means approach to image segmentation. In Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, pages 738–742. IEEE, 2003.
  36. James MacQueen et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281–297. Oakland, CA, USA, 1967.
  37. Trus image segmentation using morphological operators and dbscan clustering. In 2011 World Congress on Information and Communication Technologies, pages 898–903. IEEE, 2011.
  38. Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8364–8375, 2022.
  39. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, 14, 2001.
  40. Unsupervised learning of dense visual representations. Advances in Neural Information Processing Systems, 33:4489–4500, 2020.
  41. Autoregressive unsupervised image segmentation. In European Conference on Computer Vision, pages 142–158. Springer, 2020a.
  42. Semi-supervised semantic segmentation with cross-consistency training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12674–12684, 2020b.
  43. An adaptive clustering algorithm for image segmentation. In International Conference on Acoustics, Speech, and Signal Processing,, pages 1667–1670. IEEE, 1989.
  44. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
  45. Boundary-enhanced co-training for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19574–19584, 2023.
  46. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015.
  47. Bridging the gap to real-world object-centric learning. In The Eleventh International Conference on Learning Representations, 2023.
  48. Casting your model: Learning to localize improves self-supervised representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11058–11067, 2021.
  49. Leveraging hidden positives for unsupervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19540–19549, 2023.
  50. Real-time superpixel segmentation by dbscan clustering algorithm. IEEE Transactions on Image Processing, 25(12):5933–5942, 2016.
  51. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
  52. Multinet: Real-time joint semantic reasoning for autonomous driving. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 1013–1020. IEEE, 2018.
  53. Unsupervised semantic segmentation by contrasting object mask proposals. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10052–10062, 2021.
  54. Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3249–3256. IEEE, 2010.
  55. Dense contrastive learning for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3024–3033, 2021.
  56. Self-supervised visual representation learning with semantic grouping. In Advances in Neural Information Processing Systems, 2022.
  57. Propagate yourself: Exploring pixel-level consistency for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16684–16693, 2021.
  58. Transfgu: a top-down approach to fine-grained unsupervised semantic segmentation. In European Conference on Computer Vision, pages 73–89. Springer, 2022.
  59. Unsupervised semantic segmentation with self-supervised object-centric representations. In The Eleventh International Conference on Learning Representations, 2023.
  60. Self-supervised visual representation learning from hierarchical grouping. Advances in Neural Information Processing Systems, 33:16579–16590, 2020.
  61. Hivit: A simpler and more efficient design of hierarchical vision transformer. In The Eleventh International Conference on Learning Representations, 2023.
Citations (7)

Summary

We haven't generated a summary for this paper yet.