SOHES: Self-supervised Open-world Hierarchical Entity Segmentation
Abstract: Open-world entity segmentation, as an emerging computer vision task, aims at segmenting entities in images without being restricted by pre-defined classes, offering impressive generalization capabilities on unseen images and concepts. Despite its promise, existing entity segmentation methods like Segment Anything Model (SAM) rely heavily on costly expert annotators. This work presents Self-supervised Open-world Hierarchical Entity Segmentation (SOHES), a novel approach that eliminates the need for human annotations. SOHES operates in three phases: self-exploration, self-instruction, and self-correction. Given a pre-trained self-supervised representation, we produce abundant high-quality pseudo-labels through visual feature clustering. Then, we train a segmentation model on the pseudo-labels, and rectify the noises in pseudo-labels via a teacher-student mutual-learning procedure. Beyond segmenting entities, SOHES also captures their constituent parts, providing a hierarchical understanding of visual entities. Using raw images as the sole training data, our method achieves unprecedented performance in self-supervised open-world segmentation, marking a significant milestone towards high-quality open-world entity segmentation in the absence of human-annotated masks. Project page: https://SOHES.github.io.
- Zero-shot object detection. In ECCV, 2018.
- Towards open world recognition. In CVPR, 2015.
- Towards open set deep networks. In CVPR, 2016.
- Cascade R-CNN: Delving into high quality object detection. In CVPR, 2018.
- HASSOD: Hierarchical adaptive self-supervised object detection. In NeurIPS, 2023.
- Emerging properties in self-supervised vision transformers. In ICCV, 2021.
- DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. TPAMI, 40(4):834–848, 2017.
- A simple framework for contrastive learning of visual representations. In ICML, 2020.
- Vision transformer adapter for dense predictions. In ICLR, 2022.
- Masked-attention mask transformer for universal image segmentation. In CVPR, 2022.
- CascadePSP: Toward class-agnostic and very high-resolution segmentation via global and local refinement. In CVPR, 2020.
- ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
- The overlooked elephant of object detection: Open set. In WACV, 2020.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
- Simple copy-paste is a strong data augmentation method for instance segmentation. In CVPR, 2021.
- LVIS: A dataset for large vocabulary instance segmentation. In CVPR, 2019.
- The elements of statistical learning: Data mining, inference, and prediction. Springer, 2009.
- PartImageNet: A large, high-quality dataset of parts. In ECCV, 2022a.
- Mask R-CNN. In ICCV, 2017.
- Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
- Masked autoencoders are scalable vision learners. In CVPR, 2022b.
- Learning to segment every thing. In CVPR, 2018.
- Class-agnostic object detection. In WACV, 2021.
- Towards open world object detection. In CVPR, 2021.
- Open-world instance segmentation: Top-down learning with bottom-up supervision. arXiv preprint arXiv:2303.05503, 2023.
- Learning open-world object proposals without learning to classify. IEEE Robotics and Automation Letters, 7(2):5453–5460, 2022.
- Panoptic segmentation. In CVPR, 2019.
- Segment anything. In ICCV, 2023.
- Efficient inference in fully connected CRFs with Gaussian edge potentials. In NeurIPS, 2011.
- Microsoft COCO: Common objects in context. In ECCV, 2014.
- Focal loss for dense object detection. In ICCV, 2017.
- Opening up open world tracking. In CVPR, 2022.
- Unbiased teacher for semi-supervised object detection. In ICLR, 2020.
- Fully convolutional networks for semantic segmentation. In CVPR, 2015.
- Open-world entity segmentation. TPAMI, 45(7):8743–8756, 2022.
- High-quality entity segmentation. In ICCV, 2023.
- PACO: Parts and attributes of common objects. In CVPR, 2023.
- Toward open set recognition. TPAMI, 35(7):1757–1772, 2012.
- High quality segmentation for ultra high-resolution images. In CVPR, 2022.
- Normalized cuts and image segmentation. TPAMI, 22(8):888–905, 2000.
- Localizing objects with self-supervised transformers and no labels. In BMVC, 2021.
- Unsupervised object localization: Observing the background to discover objects. In CVPR, 2023.
- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NeurIPS, 2017.
- Selective search for object recognition. IJCV, 104:154–171, 2013.
- Unsupervised image matching and object discovery as optimization. In CVPR, 2019.
- Toward unsupervised, multi-object discovery in large-scale image collections. In ECCV, 2020.
- Large-scale unsupervised object discovery. In NeurIPS, 2021.
- Unidentified video objects: A benchmark for dense, open-world segmentation. In ICCV, 2021.
- Open-world instance segmentation: Exploiting pseudo ground truth from learned pairwise affinity. In CVPR, 2022a.
- FreeSOLO: Learning to segment objects without annotations. In CVPR, 2022b.
- Cut and learn for unsupervised object detection and instance segmentation. In CVPR, 2023.
- TokenCut: Segmenting objects in images and videos with self-supervised transformer and normalized cut. In CVPR, 2022c.
- Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models. arXiv preprint arXiv:2208.06677, 2022.
- GroupViT: Semantic segmentation emerges from text supervision. In CVPR, 2022.
- Scene parsing through ADE20K dataset. In CVPR, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.