Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 164 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 72 tok/s Pro
Kimi K2 204 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision (2303.05503v2)

Published 9 Mar 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Many top-down architectures for instance segmentation achieve significant success when trained and tested on pre-defined closed-world taxonomy. However, when deployed in the open world, they exhibit notable bias towards seen classes and suffer from significant performance drop. In this work, we propose a novel approach for open world instance segmentation called bottom-Up and top-Down Open-world Segmentation (UDOS) that combines classical bottom-up segmentation algorithms within a top-down learning framework. UDOS first predicts parts of objects using a top-down network trained with weak supervision from bottom-up segmentations. The bottom-up segmentations are class-agnostic and do not overfit to specific taxonomies. The part-masks are then fed into affinity-based grouping and refinement modules to predict robust instance-level segmentations. UDOS enjoys both the speed and efficiency from the top-down architectures and the generalization ability to unseen categories from bottom-up supervision. We validate the strengths of UDOS on multiple cross-category as well as cross-dataset transfer tasks from 5 challenging datasets including MS-COCO, LVIS, ADE20k, UVO and OpenImages, achieving significant improvements over state-of-the-art across the board. Our code and models are available on our project page.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4981–4990, 2018.
  2. Pablo Arbelaez. Boundary extraction in natural images using ultrametric contour maps. In CVPR Workshops, 2006.
  3. Pixelwise instance segmentation with a dynamically instantiated network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 441–450, 2017.
  4. A generalized framework for agglomerative clustering of signed graphs applied to instance segmentation. arXiv preprint arXiv:1906.11713, 2019.
  5. Correlation clustering. Machine learning, 56(1):89–113, 2004.
  6. Detreg: Unsupervised pretraining with region priors for object detection. arXiv preprint arXiv:2106.04550, 2021.
  7. Towards open world recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1893–1902, 2015.
  8. Large-scale interactive object segmentation with human annotators. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11700–11709, 2019.
  9. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9157–9166, 2019.
  10. Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell., 23(11):1222–1239, 2001.
  11. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
  12. Towards segmenting anything that moves. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019.
  13. Learning to better segment objects from unseen classes with unlabeled videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3375–3384, 2021.
  14. A discriminatively trained, multiscale, deformable part model. In 2008 IEEE conference on computer vision and pattern recognition, pages 1–8. Ieee, 2008.
  15. Efficient graph-based image segmentation. IJCV, 59(2):167–181, 2004.
  16. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence, 32(9):1627–1645, 2009.
  17. Efficient hierarchical graph-based video segmentation. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2141–2148, 2010.
  18. Efficient hierarchical graph-based video segmentation. In 2010 ieee computer society conference on computer vision and pattern recognition, pages 2141–2148. IEEE, 2010.
  19. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5356–5364, 2019.
  20. Mask R-CNN. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
  21. Efficient visual pretraining with contrastive detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10086–10096, 2021.
  22. Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10951–10960, 2020.
  23. Learning to segment every thing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4233–4241, 2018.
  24. Mask scoring r-cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6409–6418, 2019.
  25. Segsort: Segmentation by discriminative sorting of segments. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7334–7344, 2019.
  26. Fusionseg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In 2017 IEEE conference on computer vision and pattern recognition (CVPR), pages 2117–2126. IEEE, 2017.
  27. Superpixel sampling networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 352–368, 2018.
  28. Towards open world object detection. In CVPR, 2021.
  29. Learning open-world object proposals without learning to classify. IEEE Robotics and Automation Letters, 2022.
  30. Panoptic segmentation. In CVPR, 2019.
  31. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023.
  32. Recurrent pixel embedding for instance grouping. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9018–9028, 2018.
  33. Deepbox: Learning objectness with convolutional networks. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 2479–2487, Los Alamitos, CA, USA, 2015. IEEE Computer Society.
  34. Shapemask: Learning to segment novel objects by refining shape priors. In Proceedings of the ieee/cvf international conference on computer vision, pages 9207–9216, 2019.
  35. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
  36. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
  37. Sgn: Sequential grouping networks for instance segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pages 3496–3504, 2017.
  38. Affinity derivation and graph merge for instance segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 686–703, 2018.
  39. Opening up open-world tracking. CoRR, abs/2104.11221, 2021.
  40. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2537–2546, 2019.
  41. Fully convolutional networks for semantic segmentation. In CVPR, 2015.
  42. Learning to group: A bottom-up framework for 3d part discovery in unseen categories. arXiv preprint arXiv:2002.06478, 2020.
  43. The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE international conference on computer vision, pages 4990–4999, 2017.
  44. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  45. Bayesian semantic instance segmentation in open set world. In Proceedings of the European Conference on Computer Vision (ECCV), pages 3–18, 2018.
  46. Learning to segment object candidates. In Advances in neural information processing systems, 2015.
  47. Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE transactions on pattern analysis and machine intelligence, 39(1):128–140, 2016.
  48. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pages 91–99. Curran Associates, Inc., 2015.
  49. Learning to detect every thing in an open world. arXiv preprint arXiv:2112.01698, 2022.
  50. Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9339–9347, 2019.
  51. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 22(8):888–905, 2000.
  52. Video class agnostic segmentation benchmark for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2825–2834, 2021.
  53. Selective search for object recognition. International journal of computer vision, 104(2):154–171, 2013.
  54. Max-deeplab: End-to-end panoptic segmentation with mask transformers. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, 2021a.
  55. Unidentified video objects: A benchmark for dense, open-world segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10776–10785, 2021b.
  56. Open-world instance segmentation: Exploiting pseudo ground truth from learned pairwise affinity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4422–4432, 2022.
  57. Unseen object instance segmentation for robotic environments. IEEE Transactions on Robotics, 37(5):1343–1359, 2021.
  58. Deep affinity net: Instance segmentation via affinity. arXiv preprint arXiv:2003.06849, 2020.
  59. Self-supervised visual representation learning from hierarchical grouping. Advances in Neural Information Processing Systems, 33:16579–16590, 2020.
  60. Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127(3):302–321, 2019.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 24 tweets and received 18 likes.

Upgrade to Pro to view all of the tweets about this paper: