Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision (2303.05503v2)
Abstract: Many top-down architectures for instance segmentation achieve significant success when trained and tested on pre-defined closed-world taxonomy. However, when deployed in the open world, they exhibit notable bias towards seen classes and suffer from significant performance drop. In this work, we propose a novel approach for open world instance segmentation called bottom-Up and top-Down Open-world Segmentation (UDOS) that combines classical bottom-up segmentation algorithms within a top-down learning framework. UDOS first predicts parts of objects using a top-down network trained with weak supervision from bottom-up segmentations. The bottom-up segmentations are class-agnostic and do not overfit to specific taxonomies. The part-masks are then fed into affinity-based grouping and refinement modules to predict robust instance-level segmentations. UDOS enjoys both the speed and efficiency from the top-down architectures and the generalization ability to unseen categories from bottom-up supervision. We validate the strengths of UDOS on multiple cross-category as well as cross-dataset transfer tasks from 5 challenging datasets including MS-COCO, LVIS, ADE20k, UVO and OpenImages, achieving significant improvements over state-of-the-art across the board. Our code and models are available on our project page.
- Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4981–4990, 2018.
- Pablo Arbelaez. Boundary extraction in natural images using ultrametric contour maps. In CVPR Workshops, 2006.
- Pixelwise instance segmentation with a dynamically instantiated network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 441–450, 2017.
- A generalized framework for agglomerative clustering of signed graphs applied to instance segmentation. arXiv preprint arXiv:1906.11713, 2019.
- Correlation clustering. Machine learning, 56(1):89–113, 2004.
- Detreg: Unsupervised pretraining with region priors for object detection. arXiv preprint arXiv:2106.04550, 2021.
- Towards open world recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1893–1902, 2015.
- Large-scale interactive object segmentation with human annotators. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11700–11709, 2019.
- Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9157–9166, 2019.
- Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell., 23(11):1222–1239, 2001.
- The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
- Towards segmenting anything that moves. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019.
- Learning to better segment objects from unseen classes with unlabeled videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3375–3384, 2021.
- A discriminatively trained, multiscale, deformable part model. In 2008 IEEE conference on computer vision and pattern recognition, pages 1–8. Ieee, 2008.
- Efficient graph-based image segmentation. IJCV, 59(2):167–181, 2004.
- Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence, 32(9):1627–1645, 2009.
- Efficient hierarchical graph-based video segmentation. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2141–2148, 2010.
- Efficient hierarchical graph-based video segmentation. In 2010 ieee computer society conference on computer vision and pattern recognition, pages 2141–2148. IEEE, 2010.
- Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5356–5364, 2019.
- Mask R-CNN. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
- Efficient visual pretraining with contrastive detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10086–10096, 2021.
- Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10951–10960, 2020.
- Learning to segment every thing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4233–4241, 2018.
- Mask scoring r-cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6409–6418, 2019.
- Segsort: Segmentation by discriminative sorting of segments. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7334–7344, 2019.
- Fusionseg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In 2017 IEEE conference on computer vision and pattern recognition (CVPR), pages 2117–2126. IEEE, 2017.
- Superpixel sampling networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 352–368, 2018.
- Towards open world object detection. In CVPR, 2021.
- Learning open-world object proposals without learning to classify. IEEE Robotics and Automation Letters, 2022.
- Panoptic segmentation. In CVPR, 2019.
- Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023.
- Recurrent pixel embedding for instance grouping. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9018–9028, 2018.
- Deepbox: Learning objectness with convolutional networks. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 2479–2487, Los Alamitos, CA, USA, 2015. IEEE Computer Society.
- Shapemask: Learning to segment novel objects by refining shape priors. In Proceedings of the ieee/cvf international conference on computer vision, pages 9207–9216, 2019.
- Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
- Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
- Sgn: Sequential grouping networks for instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pages 3496–3504, 2017.
- Affinity derivation and graph merge for instance segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 686–703, 2018.
- Opening up open-world tracking. CoRR, abs/2104.11221, 2021.
- Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2537–2546, 2019.
- Fully convolutional networks for semantic segmentation. In CVPR, 2015.
- Learning to group: A bottom-up framework for 3d part discovery in unseen categories. arXiv preprint arXiv:2002.06478, 2020.
- The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE international conference on computer vision, pages 4990–4999, 2017.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Bayesian semantic instance segmentation in open set world. In Proceedings of the European Conference on Computer Vision (ECCV), pages 3–18, 2018.
- Learning to segment object candidates. In Advances in neural information processing systems, 2015.
- Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE transactions on pattern analysis and machine intelligence, 39(1):128–140, 2016.
- Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pages 91–99. Curran Associates, Inc., 2015.
- Learning to detect every thing in an open world. arXiv preprint arXiv:2112.01698, 2022.
- Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9339–9347, 2019.
- Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 22(8):888–905, 2000.
- Video class agnostic segmentation benchmark for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2825–2834, 2021.
- Selective search for object recognition. International journal of computer vision, 104(2):154–171, 2013.
- Max-deeplab: End-to-end panoptic segmentation with mask transformers. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, 2021a.
- Unidentified video objects: A benchmark for dense, open-world segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10776–10785, 2021b.
- Open-world instance segmentation: Exploiting pseudo ground truth from learned pairwise affinity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4422–4432, 2022.
- Unseen object instance segmentation for robotic environments. IEEE Transactions on Robotics, 37(5):1343–1359, 2021.
- Deep affinity net: Instance segmentation via affinity. arXiv preprint arXiv:2003.06849, 2020.
- Self-supervised visual representation learning from hierarchical grouping. Advances in Neural Information Processing Systems, 33:16579–16590, 2020.
- Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127(3):302–321, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.